码迷,mamicode.com
首页 > 其他好文 > 详细

【转】Derivation of the Normal Equation for linear regression

时间:2016-07-29 18:38:12      阅读:200      评论:0      收藏:0      [点我收藏+]

标签:

I was going through the Coursera "Machine Learning" course, and in the section on multivariate linear regression something caught my eye. Andrew Ng presented the Normal Equation as an analytical solution to the linear regression problem with a least-squares cost function. He mentioned that in some cases (such as for small feature sets) using it is more effective than applying gradient descent; unfortunately, he left its derivation out.

Here I want to show how the normal equation is derived.

First, some terminology. The following symbols are compatible with the machine learning course, not with the exposition of the normal equation on Wikipedia and other sites - semantically it‘s all the same, just the symbols are different.

Given the hypothesis function:

技术分享

We‘d like to minimize the least-squares cost:

技术分享

Where 技术分享 is the i-th sample (from a set of m samples) and 技术分享 is the i-th expected result.

To proceed, we‘ll represent the problem in matrix notation; this is natural, since we essentially have a system of linear equations here. The regression coefficients 技术分享 we‘re looking for are the vector:

技术分享

Each of the m input samples is similarly a column vector with n+1 rows, 技术分享 being 1 for convenience. So we can now rewrite the hypothesis function as:

技术分享

When this is summed over all samples, we can dip further into matrix notation. We‘ll define the "design matrix" X (uppercase X) as a matrix of m rows, in which each row is the i-th sample (the vector 技术分享). With this, we can rewrite the least-squares cost as following, replacing the explicit sum by matrix multiplication:

技术分享

Now, using some matrix transpose identities, we can simplify this a bit. I‘ll throw the 技术分享 part away since we‘re going to compare a derivative to zero anyway:

技术分享 技术分享

Note that 技术分享 is a vector, and so is y. So when we multiply one by another, it doesn‘t matter what the order is (as long as the dimensions work out). So we can further simplify:

技术分享

Recall that here 技术分享 is our unknown. To find where the above function has a minimum, we will derive by 技术分享 and compare to 0. Deriving by a vector may feel uncomfortable, but there‘s nothing to worry about. Recall that here we only use matrix notation to conveniently represent a system of linear formulae. So we derive by each component of the vector, and then combine the resulting derivatives into a vector again. The result is:

技术分享

Or:

技术分享

[Update 27-May-2015: I‘ve written another post that explains in more detail how these derivatives are computed.]

Now, assuming that the matrix 技术分享 is invertible, we can multiply both sides by 技术分享 and get:

技术分享

Which is the normal equation.

【转】Derivation of the Normal Equation for linear regression

标签:

原文地址:http://www.cnblogs.com/immortal-worm/p/5719221.html

(0)
(0)
   
举报
评论 一句话评论(0
登录后才能评论!
© 2014 mamicode.com 版权所有  联系我们:gaon5@hotmail.com
迷上了代码!