码迷,mamicode.com
首页 > 其他好文 > 详细

04 Regularization

时间:2019-06-02 19:26:26      阅读:110      评论:0      收藏:0      [点我收藏+]

标签:desc   add   man   lambda   ddr   min   over   select   res   

Regularization

for Linear Regression and Logistic Regression

Define

  1. under-fitting 欠拟合(high bias)
  2. over-fitting 过拟合 (high variance):have too many features, fail to generalize(泛化) to new examples.

Addressing over-fitting

  1. Reduce number of features.
    • Manually select which features to keep.
    • Model selection algorithm.
  2. Regularization
    • Keep all the features. but reduce magnitude/values of parameters \(\theta_j\).
    • Works well when we have a lot of features, each of whitch contributes a bit to predicting \(y\).

Regularized Cost Function

  • \[min_\theta \dfrac{1}{2m} \sum_{i=1}^m (h_\theta(x^{(i)}) - y^{(i)})^2 + \lambda \sum_{j=1}^n \theta_j^2\]

Regularized Linear Regression

  1. Gradient Descent
    \[
    \begin{align*} & \text{Repeat}\ \lbrace \newline & \ \ \ \ \theta_0 := \theta_0 - \alpha\ \frac{1}{m}\ \sum_{i=1}^m (h_\theta(x^{(i)}) - y^{(i)})x_0^{(i)} \newline & \ \ \ \ \theta_j := \theta_j - \alpha\ \left[ \left( \frac{1}{m}\ \sum_{i=1}^m (h_\theta(x^{(i)}) - y^{(i)})x_j^{(i)} \right) + \frac{\lambda}{m}\theta_j \right] &\ \ \ \ \ \ \ \ \ \ j \in \lbrace 1,2...n\rbrace\newline & \rbrace \end{align*}
    \]

    • 等价于
      \[
      \theta_j := \theta_j(1 - \alpha\frac{\lambda}{m}) - \alpha\frac{1}{m} \sum_{i=1}^m(h_\theta(x^{(i)}) - y^{(i)})x_j^{(i)}
      \]
  2. Normal Equation
    \[
    \begin{align*}& \theta = \left( X^TX + \lambda \cdot L \right)^{-1} X^Ty \newline& \text{where}\ \ L = \begin{bmatrix} 0 & & & & \newline & 1 & & & \newline & & 1 & & \newline & & & \ddots & \newline & & & & 1 \newline\end{bmatrix}\end{align*}
    \]

  • 对于不可逆的\((X^TX)\), \((X^TX + \lambda.L)\) 会可逆

Regularized Logistic Regression

  1. Cost Function

\[
J(\theta) = -\frac{1}{m} \sum_{i=1}^m[y^{(i)}log(h_\theta(x^{(i)})) + (1 - y^{(i)})log(1 - h_\theta(x^{(i)}))] + \frac{\lambda}{2m}\sum_{j=1}^n\theta_j^2
\]

  1. Gradient descent
    \[
    \begin{align*} & \text{Repeat}\ \lbrace \newline & \ \ \ \ \theta_0 := \theta_0 - \alpha\ \frac{1}{m}\ \sum_{i=1}^m (h_\theta(x^{(i)}) - y^{(i)})x_0^{(i)} \newline & \ \ \ \ \theta_j := \theta_j - \alpha\ \left[ \left( \frac{1}{m}\ \sum_{i=1}^m (h_\theta(x^{(i)}) - y^{(i)})x_j^{(i)} \right) + \frac{\lambda}{m}\theta_j \right] &\ \ \ \ \ \ \ \ \ \ j \in \lbrace 1,2...n \rbrace \newline & \rbrace \end{align*}
    \]

04 Regularization

标签:desc   add   man   lambda   ddr   min   over   select   res   

原文地址:https://www.cnblogs.com/QQ-1615160629/p/04-Regularization.html

(0)
(0)
   
举报
评论 一句话评论(0
登录后才能评论!
© 2014 mamicode.com 版权所有  联系我们:gaon5@hotmail.com
迷上了代码!