1. Supervised Learning Linear Regression > 1-1. Linear Regression - Summary
Model
- $ f_\theta = \theta_0 + \theta_1;x $
- $x$ → $f_\theta(x)$ → $\hat{y}$
- Model Parameters : $\theta_0$ = Bias , $\theta_1$ = Weight
Finding Model Parameters
- find $\Theta_0$ and $\Theta_1$ that minimize the loss/cost function (MSE)
solve :
$$
\min_{ \theta_0 , \theta_1} \frac{1}{2M}(\sum_{m=1}^{M}(y - f_\theta(x))^2)
$$
- Error = actual - prediction = $y - f_\theta(x)$
- $y$ = actual , $f_\theta(x)$ = prediction
- MSE 평균제곱오차 (mean-square-error) = $\frac{1}{2M}(\sum_{m=1}^{M}(Error)^2)$
- → loss/cost function
- minimize the loss/cost function
How? Gradient Descent
$$
\theta_n:=\theta_n-\alpha \frac{\partial J}{\partial \theta}
$$
- $\alpha$ = Step Size , $\frac{\partial J}{\partial \theta}$ = Direction
Multiple Features
$$
f_\theta = \sum_{n = 0}^{N} \theta_n x_n ;,;(x_0=1)
$$
Feature Scaling
- Normalization
$$
\frac{x-x^{min}}{x^{max}-x^{min}}
$$ - Standardization
$$
\frac{x-\mu_{;(average)}}{\sigma_{;(standard;deviation)}}
$$
Polynomial Regression
- Multiple features
- $f_\theta=\Sigma_n\theta_nx_n$
- $x_1 \rightarrow x$
- $x_2 \rightarrow x^2$
- $x_3 \rightarrow x^3$
- $f_\theta=\Sigma_n\theta_nx_n$
- Feature scling
- Expression MSE
- $\frac{1}{2M}\Sigma_m(f_\theta-y)^2$
- Gradient descent → $\theta_\delta^*$
Dataset
- Train
- Model param → GD
- Validate
- Hyper param → Grid Search
- Test
- performance evaluation → MSE on text data
Learning Curves
- well-fit
- low bias, low variance
- under-fit
- high bias (높이), low variance
- over-fit
- low bias, high variance(격차)
- Good Model?
- Sufficient dataset?
Regularization
- Complex Model → less complex → Much less complex
- overfit → fit → underfit
- → $J=MSE+\lambda\Sigma_{n=1}^{N}\theta_n^2$
- $\lambda$ = Regularization parameter
- $\theta_n:=\theta_n-\alpha \frac{\partial J}{\partial \theta}$
'Study > CSC-4220 - Data Mining Machine Learning' 카테고리의 다른 글
0-3. Introduction (0) | 2023.02.15 |
---|---|
0-2. Introduction - Summary (0) | 2023.02.15 |
0-1. Intro (0) | 2023.02.15 |