打算把Andrew Ng教授的#Deep Learning#相关的6个实验一一实现了贴出来~
估计时间长度战线会拉的比较长(毕竟JOS的7级浮屠还没搞定.)
-----------------------------------------------------------------------------------------------------------------------------------
实验内容: 线性拟合
实验材料:http://download.csdn.net/detail/u011368821/8114541
-----------------------------------------------------------------------------------------------------------------------------------
线性拟合很简单,也很实用,很酷~
identity matrix——单位矩阵
实验的风格和国外的实验风格还真是一致啊~就是一个个补全待实现功能的函数
问题背景:
In this part of this exercise, you will implement linear regression with one variable to predict profits for a food truck. Suppose you are the CEO of a restaurant franchise and are considering different cities for opening
a new outlet. The chain already has trucks in various cities and you have data for profits and populations from the cities. You would like to use this data to help you select which city to expand to next.
假设你是某餐饮公司的CEO,手上有份数据,关于各个人口量不同的城市,以及你在该城市开店的盈利情况.根据这些数据,为一次开店提供一个方案.
实验的目的,如图,对于给定的数据(实质上是一个单变量函数),怎么确定一个一次函数,最好的去拟合这些散点
一开始实验要求返回一个5*5的单位矩阵,两种办法
1.直接调用eye()函数
2.自己写一个两层for循环,去初始化一个5*5的矩阵
下面我给出了两种方法的实现
function A = warmUpExercise() %WARMUPEXERCISE Example function in octave % A = WARMUPEXERCISE() is an example function that returns the 5x5 identity matrix A = eye(5); % ============= YOUR CODE HERE ============== % Instructions: Return the 5x5 identity matrix % In octave, we return values by defining which variables % represent the return values (at the top of the file) % and then set them accordingly. % % for row = 1:5 % for col = 1:5 % if row == col % A(row,col) = 1; % else % A(row,col) = 0; % end % end % end % % =========================================== end
function plotData(x, y) %PLOTDATA Plots the data points x and y into a new figure % PLOTDATA(x,y) plots the data points and gives the figure axes labels of % population and profit. % ====================== YOUR CODE HERE ====================== % Instructions: Plot the training data into a figure using the % "figure" and "plot" commands. Set the axes labels using % the "xlabel" and "ylabel" commands. Assume the % population and revenue data have been passed in % as the x and y arguments of this function. % % Hint: You can use the 'rx' option with plot to have the markers % appear as red crosses. Furthermore, you can make the % markers larger by using plot(..., 'rx', 'MarkerSize', 10); figure; % open a new figure window plot(x,y,'rx','MarkerSize',10); ylabel('Profit in $10,000s'); xlabel('Population of City in 10,000s'); % ============================================================ end
这里是介绍怎么计算cost function。
怎么评价拟合的程度好坏呢?
当拟合的值h和实际值y各个点的差值的平方趋向于0的时候,我们认为拟合的好(因为这样y和h越是靠近)
注意,这里h是大小和y一致的矩阵,而theta是一个二维列向量初始的theta = [0 ; 0];(再次强调,列向量)那问题又来了,初始的theta是[0;0],那么怎么调整这里的斜率和截距(theta)呢?
如果theta大了,那么表现为h - y大于0,于是alpha(就是一个常数,用作刻画精度)乘以这个(h-y).会得到一个正数,于是theta 减去这个数,就变变小
同理theta小了, 会减去一个负数(因为h - y 小于0),那么theta会变大
(强调,这里使用的theta变大变小,均针对于theta 矩阵内的某一元素而言,并非指整个矩阵)
那么只要迭代的次数住够多,会得到一个很恰当的theta使得cost function的值足够小(事实上,在我们的实验中,迭代了1500次,使得cost function最小达到了0)!
接着要实现函数computeCost,这里提供了“两种方法”,实质上还是一样的,方法一用的传统的计算方法,而方法二则是基于矩阵进行计算(“优雅”得多)
function J = computeCost(X, y, theta) %COMPUTECOST Compute cost for linear regression % J = COMPUTECOST(X, y, theta) computes the cost of using theta as the % parameter for linear regression to fit the data points in X and y % Initialize some useful values m = length(y); % number of training examples % You need to return the following variables correctly J = 0; % ====================== YOUR CODE HERE ====================== % Instructions: Compute the cost of a particular choice of theta % You should set J to the cost. % % Implementation method one: % temp = (X*theta-y).^2; % J = sum(temp(:))./(2*m); % Implementation method two: temp = (X*theta - y); J = (temp'*temp)./(2*m); % ========================================================================= end
接着完成gradientDescent函数即可,这里要对矩阵的运算相当熟悉...
function [theta, J_history] = gradientDescent(X, y, theta, alpha, num_iters) %GRADIENTDESCENT Performs gradient descent to learn theta % theta = GRADIENTDESENT(X, y, theta, alpha, num_iters) updates theta by % taking num_iters gradient steps with learning rate alpha % Initialize some useful values m = length(y); % number of training examples J_history = zeros(num_iters, 1); for iter = 1:num_iters % ====================== YOUR CODE HERE ====================== % Instructions: Perform a single gradient step on the parameter vector % theta. % % Hint: While debugging, it can be useful to print out the values % of the cost function (computeCost) and gradient here. % temp = (X'*(X*theta -y))./m; theta = theta - (alpha*(temp)); % ============================================================ % Save the cost J in every iteration J_history(iter) = computeCost(X, y, theta); end end
OK! 最后会得到最佳的theta.蓝色的直线是我们的结果.近似的我们可以认为人口越大的城市,盈利越多。
这里仅分析了盈利多少和人口的关系,有兴趣的话,我们还可以研究盈利的稳定性,利用方差去刻画~这里不再多做讨论.
接着已知最优theta的情况下,又分析了在一定范围内cost function值的变化,越是远离最优点,cost function的值就越大,拟合的效果就越差!
多变量的线性拟合.
换汤不换药的,熟悉矩阵运算,基本上不需要改动代码即可(矩阵的魅力啊~)
问题的背景:
The file ex1data2.txt contains a training set of housing prices in Portland, Oregon. The first column is the size of the house (in square feet), the second column is the number of bedrooms, and the third column is the price
of the house.
关于房间大小,房子大小,房价之间的关系,数据来源于Portland 和Oregon。
x = [2104 3], y = 399900 房子大小是2104,有3个卧室,房价是399900,其他的同理
x = [1600 3], y = 329900
x = [2400 3], y = 369000
x = [1416 2], y = 232000
x = [3000 4], y = 539900
... ...
为什么要进行归一化处理
By looking at the values, note that house sizes are about 1000 times the number of bedrooms. When features differ by orders of magnitude, first performing feature scaling can make gradient descent converge
much more quickly.
这里不仅仅是为了尽快收敛,而且不归一化,会bug。。。
function [X_norm, mu, sigma] = featureNormalize(X) %FEATURENORMALIZE Normalizes the features in X % FEATURENORMALIZE(X) returns a normalized version of X where % the mean value of each feature is 0 and the standard deviation % is 1. This is often a good preprocessing step to do when % working with learning algorithms. % You need to set these values correctly X_norm = X; mu = zeros(1, size(X, 2)); sigma = zeros(1, size(X, 2)); % ====================== YOUR CODE HERE ====================== % Instructions: First, for each feature dimension, compute the mean % of the feature and subtract it from the dataset, % storing the mean value in mu. Next, compute the % standard deviation of each feature and divide % each feature by it's standard deviation, storing % the standard deviation in sigma. % % Note that X is a matrix where each column is a % feature and each row is an example. You need % to perform the normalization separately for % each feature. % % Hint: You might find the 'mean' and 'std' functions useful. % for temp = 1:size(X,2) mu(temp) = mean(X(:,temp)); sigma(temp) = std(X(:,temp)); X_norm(:,temp) = (X_norm(:,temp) - mu(temp))./sigma(temp); end % ============================================================ end
function J = computeCostMulti(X, y, theta) %COMPUTECOSTMULTI Compute cost for linear regression with multiple variables % J = COMPUTECOSTMULTI(X, y, theta) computes the cost of using theta as the % parameter for linear regression to fit the data points in X and y % Initialize some useful values m = length(y); % number of training examples % You need to return the following variables correctly J = 0; % ====================== YOUR CODE HERE ====================== % Instructions: Compute the cost of a particular choice of theta % You should set J to the cost. temp = (X*theta - y); J = (temp'*temp)./(2*m); % ========================================================================= end
gradientDescentMulti函数的实现
function [theta, J_history] = gradientDescentMulti(X, y, theta, alpha, num_iters) %GRADIENTDESCENTMULTI Performs gradient descent to learn theta % theta = GRADIENTDESCENTMULTI(x, y, theta, alpha, num_iters) updates theta by % taking num_iters gradient steps with learning rate alpha % Initialize some useful values m = length(y); % number of training examples J_history = zeros(num_iters, 1); for iter = 1:num_iters % ====================== YOUR CODE HERE ====================== % Instructions: Perform a single gradient step on the parameter vector % theta. % % Hint: While debugging, it can be useful to print out the values % of the cost function (computeCostMulti) and gradient here. % temp = X'*(X*theta - y)./m; theta = theta - alpha*temp; % ============================================================ % Save the cost J in every iteration J_history(iter) = computeCostMulti(X, y, theta); end end
会发现误差函数的输出值随着迭代次数的增加而减小的很快
最后根据公式求解theta,这种方法并不需要迭代,而能很好的求得近似解(实质上还是矩阵的变换, theta*X = y 变形即得)
function [theta] = normalEqn(X, y) %NORMALEQN Computes the closed-form solution to linear regression % NORMALEQN(X,y) computes the closed-form solution to linear % regression using the normal equations. theta = zeros(size(X, 2), 1); % ====================== YOUR CODE HERE ====================== % Instructions: Complete the code to compute the closed form solution % to linear regression and put the result in theta. % % ---------------------- Sample Solution ---------------------- theta = (inv(X'*X))*X'*y; % ------------------------------------------------------------- % ============================================================ end
估算出1650sq-ft大小,3个卧室,的房子需要91490左右的价格
客栈 布洛赫 丹麦 布面油画 110x70cm 私人收藏
这幅画描绘了丹麦人的生活场景。客栈,小饭馆,吃饭。画中人物衣着比较有特色,人物表情被画家绘画得很生动。观者能感受出画中男子不善的目光,对面两个女子目光也是窥视这个方向,让人有种不安的感觉。他们说的什么?感觉一定不是好话。这是画家传神的技法的展现。也惊叹人物后面破旧的墙壁,逼真地让人以为这是一张照片。作者卡尔·海因里希·布洛赫(Carl Heinrich Bloch)是19世纪丹麦画家。
原文地址:http://blog.csdn.net/cinmyheart/article/details/40720291