码迷,mamicode.com
首页 > 其他好文 > 详细

UFLDL tutorial 代码分析

时间:2014-10-22 14:21:26      阅读:352      评论:0      收藏:0      [点我收藏+]

标签:style   blog   http   color   io   os   ar   for   strong   

之前一直没怎么接触过代码,前段时间朋友提起了caffe。本想看看caffe怎么用,无奈自己太渣了,不会用……想起之前也没怎么接触过这方面知识,就从入门开始吧。本文代码来自UFLDL tutorial

1.函数分析

    MATLAB代码,和UFLDL Tutorial对应。代码调用minFunc求解,可以先看第二部分再看此处。


minFunc : uncontrained optimizer using a line search strategy.(注:虽然此处没有表明,但方法只能求解无约束优化问题)

    函数采用下降方法求解最小值,可以参考凸优化和机器学习第3节的内容,当然Boyd的《凸优化》在时间充裕的情况下显然是更好的选择。

Inputs : funObj, x0, options, varagin

    funObj 提供了代价函数和梯度

    x0 迭代过程中的初值

    options 函数参数传递

    varagin funObj中需要的参数.

Outputs : x, f, exitflag output

    x 迭代结果,最小值

    f 最小值处的代价函数的取值
    exitflag 退出时的状态
    output 函数运行时相关信息

输入参数(options)

DerivativeCheck

verbose & verboseI & debug & doPlot 通过DISPLAY控制, 决定运行过程中显示多少信息.

method METHOD控制, 表示采用的下降方法. LS_init,LS_type,LS_interp,LS_multi,Fref,Damped,HessianIter,c2 的取值和采用的方法有关,也能够自己指定(部分).

其他参数,包括maxFunEvals, maxIter, optTol, progTol, ...可以保持默认或指定

凸优化求解——下降方法

    参考凸优化和机器学习第3节的内容,当然Boyd的《凸优化》在时间充裕的情况下显然是更好的选择。

处理流程

(以最速下降法为例,比较简单,主要是其他的我不会……)

    变量含义

    x:  此时的位置

    d: 下降方向

    t:  下降步长

1. 预处理

2. SD < NEWTON, Hessian矩阵不需要计算。采用funObj 计算f(代价函数值)和g(x点处的梯度)。

3. 循环,直到迭代次数达到或者最小步长/代价函数差值达到。

    对于最速下降法, 下降方向= 负梯度,因此d=-g

    之后采用直线搜索策略,找最小步长。默认采用回溯直线搜索方法。

    初始化t=1,调用函数ArmijoBacktrack计算即可。

4 . 对结果进行相应的检验和判断。


ArmijoBacktrack

    标准的回溯直线搜索方法。参数含义可参见注释

bubuko.com,布布扣

    本程序中c1代表alpha(默认值为1e-4),每次循环t=t/2。

2.代码

注:MATLAB 脚本(不是函数)开头最好加上以下几行,清楚工作区,变量,以及关闭图等。

clc;
clear all;
close all;

Linear Regression

    ex1a_linreg.m line 47

theta = minFunc(@linear_regression, theta, options, train.X, train.y);

    其中options 是struct,决定minFunc参数。train.x和train.y是输入数据集(训练),linear_regression是编写的函数,输入参数中的theta是初值,输出theta是求得的线性回归的权值。即

bubuko.com,布布扣

    minFunc 起到了求解上式的功能。

    linear_regression(注释已略去)

function [f,g] = linear_regression(theta, X,y)

  f=0;
  g=zeros(size(theta));

  yEst=theta*X;
  f=(y-yEst)*(y-yEst);
  g=X*(yEst-y);
  
end

     minFunc计算过程中需要得知代价函数和梯度,以上代码提供了这一功能。

Logistic Regression

bubuko.com,布布扣
function [f,g] = logistic_regression(theta, X,y)
  %
  % Arguments:
  %   theta - A column vector containing the parameter values to optimize.
  %   X - The examples stored in a matrix.  
  %       X(i,j) is the ith coordinate of the jth example.
  %   y - The label for each example.  y(j) is the jth examples label.
  %

  m=size(X,2);
  
  % initialize objective value and gradient.
  f = 0;
  g = zeros(size(theta));

%%% YOUR CODE HERE %%%
  h = 1./(1+exp(-theta*X));
  f = -y*log(h)+(y-1)*log(1-h);
  g = X*(h-y);
View Code

Softmax Regression(供参考)

bubuko.com,布布扣
function [f,g] = softmax_regression(theta, X,y)
  %
  % Arguments:
  %   theta - A vector containing the parameter values to optimize.
  %       In minFunc, theta is reshaped to a long vector.  So we need to
  %       resize it to an n-by-(num_classes-1) matrix.
  %       Recall that we assume theta(:,num_classes) = 0.
  %
  %   X - The examples stored in a matrix.  
  %       X(i,j) is the ith coordinate of the jth example.
  %   y - The label for each example.  y(j) is the jth examples label.
  %
  m=size(X,2);
  n=size(X,1);

  % theta is a vector;  need to reshape to n x num_classes.
  theta=reshape(theta, n, []);
  num_classes=size(theta,2)+1;
  
  % initialize objective value and gradient.

  %
  % TODO:  Compute the softmax objective function and gradient using vectorized code.
  %        Store the objective function value in f, and the gradient in g.
  %        Before returning g, make sure you form it back into a vector with g=g(:);
  %
%%% YOUR CODE HERE %%%
    f = 0;
  g = zeros(size(theta));
  a = theta*X;
  a = [a;zeros(1,size(a,2))];
  a = exp(a);
  aSum = sum(a);
  compareMatrix = 1:10;
  compareMatrix = repmat(compareMatrix,1,m);
  h=log(a./repmat(aSum,num_classes,1));
  judMatrix = abs(compareMatrix - repmat(y,num_classes,1));
  A = judMatrix;
  A(judMatrix>0) = 0;
  A(judMatrix==0) = 1;
  B = A*h;
  f = -sum(diag(B));
  g = -X*(A-a./repmat(aSum,num_classes,1));
  g = g(:,1:9);
  g=g(:); % make gradient a vector for minFunc
View Code

PCA Whitening(这个看教程就可以了)

bubuko.com,布布扣
%%================================================================
%% Step 0a: Load data
%  Here we provide the code to load natural image data into x.
%  x will be a 784 * 600000 matrix, where the kth column x(:, k) corresponds to
%  the raw image data from the kth 12x12 image patch sampled.
%  You do not need to change the code below.
clear all;
close all;
clc;
x = loadMNISTImages(train-images-idx3-ubyte);
figure(name,Raw images);
randsel = randi(size(x,2),200,1); % A random selection of samples for visualization
display_network(x(:,randsel));

%%================================================================
%% Step 0b: Zero-mean the data (by row)
%  You can make use of the mean and repmat/bsxfun functions.

%%% YOUR CODE HERE %%%
xMeanRow = mean(x);
x = x-repmat(xMeanRow,size(x,1),1);
%%================================================================
%% Step 1a: Implement PCA to obtain xRot
%  Implement PCA to obtain xRot, the matrix in which the data is expressed
%  with respect to the eigenbasis of sigma, which is the matrix U.

%%% YOUR CODE HERE %%%
xCorr = x*x/size(x,2);
[U S V] = svd(xCorr);
xRot = U*x;
%%================================================================
%% Step 1b: Check your implementation of PCA
%  The covariance matrix for the data expressed with respect to the basis U
%  should be a diagonal matrix with non-zero entries only along the main
%  diagonal. We will verify this here.
%  Write code to compute the covariance matrix, covar. 
%  When visualised as an image, you should see a straight line across the
%  diagonal (non-zero entries) against a blue background (zero entries).

%%% YOUR CODE HERE %%%
covar = xRot*xRot/size(xRot,2);
% Visualise the covariance matrix. You should see a line across the
% diagonal against a blue background.
figure(name,Visualisation of covariance matrix);
imagesc(covar);

%%================================================================
%% Step 2: Find k, the number of components to retain
%  Write code to determine k, the number of components to retain in order
%  to retain at least 99% of the variance.

%%% YOUR CODE HERE %%%
var = sum(diag(covar));
varMin = 0.99*var;
varSum = 0;
k = 0;
A = diag(covar);
for i=1:length(A)
    varSum = varSum+A(i);
    if(varSum>=varMin && k==0)
        k = i;
    end
end
%%================================================================
%% Step 3: Implement PCA with dimension reduction
%  Now that you have found k, you can reduce the dimension of the data by
%  discarding the remaining dimensions. In this way, you can represent the
%  data in k dimensions instead of the original 144, which will save you
%  computational time when running learning algorithms on the reduced
%  representation.
% 
%  Following the dimension reduction, invert the PCA transformation to produce 
%  the matrix xHat, the dimension-reduced data with respect to the original basis.
%  Visualise the data and compare it to the raw data. You will observe that
%  there is little loss due to throwing away the principal components that
%  correspond to dimensions with low variation.

%%% YOUR CODE HERE %%%
xRot = U*x;
xTilde = U(:,1:k)*x;
xHat = U*[xTilde;zeros(size(x,1)-k,size(x,2))];
% Visualise the data, and compare it to the raw data
% You should observe that the raw and processed data are of comparable quality.
% For comparison, you may wish to generate a PCA reduced image which
% retains only 90% of the variance.

figure(name,[PCA processed images ,sprintf((%d / %d dimensions), k, size(x, 1)),‘‘]);
display_network(xHat(:,randsel));
figure(name,Raw images);
display_network(x(:,randsel));

%%================================================================
%% Step 4a: Implement PCA with whitening and regularisation
%  Implement PCA with whitening and regularisation to produce the matrix
%  xPCAWhite. 

epsilon = 1e-1; 
%%% YOUR CODE HERE %%%
xPCAWhite = diag(1./sqrt(diag(S) + epsilon)) * xRot;
covar = xPCAWhite*xPCAWhite;
%% Step 4b: Check your implementation of PCA whitening 
%  Check your implementation of PCA whitening with and without regularisation. 
%  PCA whitening without regularisation results a covariance matrix 
%  that is equal to the identity matrix. PCA whitening with regularisation
%  results in a covariance matrix with diagonal entries starting close to 
%  1 and gradually becoming smaller. We will verify these properties here.
%  Write code to compute the covariance matrix, covar. 
%
%  Without regularisation (set epsilon to 0 or close to 0), 
%  when visualised as an image, you should see a red line across the
%  diagonal (one entries) against a blue background (zero entries).
%  With regularisation, you should see a red line that slowly turns
%  blue across the diagonal, corresponding to the one entries slowly
%  becoming smaller.

%%% YOUR CODE HERE %%%

% Visualise the covariance matrix. You should see a red line across the
% diagonal against a blue background.
figure(name,Visualisation of covariance matrix);
imagesc(covar);

%%================================================================
%% Step 5: Implement ZCA whitening
%  Now implement ZCA whitening to produce the matrix xZCAWhite. 
%  Visualise the data and compare it to the raw data. You should observe
%  that whitening results in, among other things, enhanced edges.

%%% YOUR CODE HERE %%%
xZCAWhite = U * diag(1./sqrt(diag(S) + epsilon)) * U * x;
% Visualise the data, and compare it to the raw data.
% You should observe that the whitened images have enhanced edges.
figure(name,ZCA whitened images);
display_network(xZCAWhite(:,randsel));
figure(name,Raw images);
display_network(x(:,randsel));
View Code

 

暂时就这些了,其他的代码看了,写了之后也会写在这里……

UFLDL tutorial 代码分析

标签:style   blog   http   color   io   os   ar   for   strong   

原文地址:http://www.cnblogs.com/sea-wind/p/4042939.html

(0)
(0)
   
举报
评论 一句话评论(0
登录后才能评论!
© 2014 mamicode.com 版权所有  联系我们:gaon5@hotmail.com
迷上了代码!