UFLDL tutorial 代码分析

时间：2014-10-22 14:21:26 阅读：352 评论：0 收藏：0 [点我收藏+]

标签：style blog http color io os ar for strong

之前一直没怎么接触过代码，前段时间朋友提起了caffe。本想看看caffe怎么用，无奈自己太渣了，不会用……想起之前也没怎么接触过这方面知识，就从入门开始吧。本文代码来自UFLDL tutorial。

1.函数分析

MATLAB代码，和UFLDL Tutorial对应。代码调用minFunc求解，可以先看第二部分再看此处。

minFunc : uncontrained optimizer using a line search strategy.（注：虽然此处没有表明，但方法只能求解无约束凸优化问题）

函数采用下降方法求解最小值，可以参考凸优化和机器学习第3节的内容，当然Boyd的《凸优化》在时间充裕的情况下显然是更好的选择。

Inputs : funObj, x0, options, varagin

funObj 提供了代价函数和梯度

x0 迭代过程中的初值

options 函数参数传递

varagin funObj中需要的参数.

Outputs : x, f, exitflag output

x 迭代结果，最小值

    f 最小值处的代价函数的取值
    exitflag 退出时的状态
    output 函数运行时相关信息

输入参数（options）

DerivativeCheck

verbose & verboseI & debug & doPlot 通过DISPLAY控制, 决定运行过程中显示多少信息.

method METHOD控制, 表示采用的下降方法. LS_init，LS_type，LS_interp，LS_multi，Fref，Damped，HessianIter，c2 的取值和采用的方法有关，也能够自己指定（部分）.

其他参数，包括maxFunEvals, maxIter, optTol, progTol, ...可以保持默认或指定

凸优化求解——下降方法

参考凸优化和机器学习第3节的内容，当然Boyd的《凸优化》在时间充裕的情况下显然是更好的选择。

处理流程

（以最速下降法为例，比较简单，主要是其他的我不会……）

变量含义

x: 此时的位置

d: 下降方向

t: 下降步长

1. 预处理

2. SD < NEWTON, Hessian矩阵不需要计算。采用funObj 计算f(代价函数值)和g(x点处的梯度）。

3. 循环，直到迭代次数达到或者最小步长/代价函数差值达到。

对于最速下降法, 下降方向= 负梯度，因此d=-g

之后采用直线搜索策略，找最小步长。默认采用回溯直线搜索方法。

初始化t=1，调用函数ArmijoBacktrack计算即可。

4 . 对结果进行相应的检验和判断。

ArmijoBacktrack

标准的回溯直线搜索方法。参数含义可参见注释

本程序中c1代表alpha（默认值为1e-4），每次循环t=t/2。

2.代码

注：MATLAB 脚本（不是函数）开头最好加上以下几行,清楚工作区，变量，以及关闭图等。

clc;
clear all;
close all;

Linear Regression

ex1a_linreg.m line 47

theta = minFunc(@linear_regression, theta, options, train.X, train.y);

其中options 是struct，决定minFunc参数。train.x和train.y是输入数据集（训练），linear_regression是编写的函数，输入参数中的theta是初值，输出theta是求得的线性回归的权值。即

$theta = \arg \min \left\{ {{{\left( {thet{a^T}*train.x - train.y} \right)}^2}} \right\}$

minFunc 起到了求解上式的功能。

linear_regression(注释已略去)

function [f,g] = linear_regression(theta, X,y)

  f=0;
  g=zeros(size(theta));

  yEst=theta‘*X;
  f=(y-yEst)*(y-yEst)‘;
  g=X*(yEst-y)‘;
  
end

minFunc计算过程中需要得知代价函数和梯度，以上代码提供了这一功能。

Logistic Regression

function [f,g] = logistic_regression(theta, X,y)
  %
  % Arguments:
  %   theta - A column vector containing the parameter values to optimize.
  %   X - The examples stored in a matrix.  
  %       X(i,j) is the i‘th coordinate of the j‘th example.
  %   y - The label for each example.  y(j) is the j‘th example‘s label.
  %

  m=size(X,2);
  
  % initialize objective value and gradient.
  f = 0;
  g = zeros(size(theta));

%%% YOUR CODE HERE %%%
  h = 1./(1+exp(-theta‘*X));
  f = -y*log(h‘)+(y-1)*log(1-h‘);
  g = X*(h-y)‘;

View Code

Softmax Regression（供参考）

function [f,g] = softmax_regression(theta, X,y)
  %
  % Arguments:
  %   theta - A vector containing the parameter values to optimize.
  %       In minFunc, theta is reshaped to a long vector.  So we need to
  %       resize it to an n-by-(num_classes-1) matrix.
  %       Recall that we assume theta(:,num_classes) = 0.
  %
  %   X - The examples stored in a matrix.  
  %       X(i,j) is the i‘th coordinate of the j‘th example.
  %   y - The label for each example.  y(j) is the j‘th example‘s label.
  %
  m=size(X,2);
  n=size(X,1);

  % theta is a vector;  need to reshape to n x num_classes.
  theta=reshape(theta, n, []);
  num_classes=size(theta,2)+1;
  
  % initialize objective value and gradient.

  %
  % TODO:  Compute the softmax objective function and gradient using vectorized code.
  %        Store the objective function value in ‘f‘, and the gradient in ‘g‘.
  %        Before returning g, make sure you form it back into a vector with g=g(:);
  %
%%% YOUR CODE HERE %%%
    f = 0;
  g = zeros(size(theta));
  a = theta‘*X;
  a = [a;zeros(1,size(a,2))];
  a = exp(a);
  aSum = sum(a);
  compareMatrix = 1:10;
  compareMatrix = repmat(compareMatrix‘,1,m);
  h=log(a./repmat(aSum,num_classes,1));
  judMatrix = abs(compareMatrix - repmat(y,num_classes,1));
  A = judMatrix;
  A(judMatrix>0) = 0;
  A(judMatrix==0) = 1;
  B = A*h‘;
  f = -sum(diag(B));
  g = -X*(A-a./repmat(aSum,num_classes,1))‘;
  g = g(:,1:9);
  g=g(:); % make gradient a vector for minFunc

View Code

PCA Whitening（这个看教程就可以了）

%%================================================================
%% Step 0a: Load data
%  Here we provide the code to load natural image data into x.
%  x will be a 784 * 600000 matrix, where the kth column x(:, k) corresponds to
%  the raw image data from the kth 12x12 image patch sampled.
%  You do not need to change the code below.
clear all;
close all;
clc;
x = loadMNISTImages(‘train-images-idx3-ubyte‘);
figure(‘name‘,‘Raw images‘);
randsel = randi(size(x,2),200,1); % A random selection of samples for visualization
display_network(x(:,randsel));

%%================================================================
%% Step 0b: Zero-mean the data (by row)
%  You can make use of the mean and repmat/bsxfun functions.

%%% YOUR CODE HERE %%%
xMeanRow = mean(x);
x = x-repmat(xMeanRow,size(x,1),1);
%%================================================================
%% Step 1a: Implement PCA to obtain xRot
%  Implement PCA to obtain xRot, the matrix in which the data is expressed
%  with respect to the eigenbasis of sigma, which is the matrix U.

%%% YOUR CODE HERE %%%
xCorr = x*x‘/size(x,2);
[U S V] = svd(xCorr);
xRot = U‘*x;
%%================================================================
%% Step 1b: Check your implementation of PCA
%  The covariance matrix for the data expressed with respect to the basis U
%  should be a diagonal matrix with non-zero entries only along the main
%  diagonal. We will verify this here.
%  Write code to compute the covariance matrix, covar. 
%  When visualised as an image, you should see a straight line across the
%  diagonal (non-zero entries) against a blue background (zero entries).

%%% YOUR CODE HERE %%%
covar = xRot*xRot‘/size(xRot,2);
% Visualise the covariance matrix. You should see a line across the
% diagonal against a blue background.
figure(‘name‘,‘Visualisation of covariance matrix‘);
imagesc(covar);

%%================================================================
%% Step 2: Find k, the number of components to retain
%  Write code to determine k, the number of components to retain in order
%  to retain at least 99% of the variance.

%%% YOUR CODE HERE %%%
var = sum(diag(covar));
varMin = 0.99*var;
varSum = 0;
k = 0;
A = diag(covar);
for i=1:length(A)
    varSum = varSum+A(i);
    if(varSum>=varMin && k==0)
        k = i;
    end
end
%%================================================================
%% Step 3: Implement PCA with dimension reduction
%  Now that you have found k, you can reduce the dimension of the data by
%  discarding the remaining dimensions. In this way, you can represent the
%  data in k dimensions instead of the original 144, which will save you
%  computational time when running learning algorithms on the reduced
%  representation.
% 
%  Following the dimension reduction, invert the PCA transformation to produce 
%  the matrix xHat, the dimension-reduced data with respect to the original basis.
%  Visualise the data and compare it to the raw data. You will observe that
%  there is little loss due to throwing away the principal components that
%  correspond to dimensions with low variation.

%%% YOUR CODE HERE %%%
xRot = U‘*x;
xTilde = U(:,1:k)‘*x;
xHat = U*[xTilde;zeros(size(x,1)-k,size(x,2))];
% Visualise the data, and compare it to the raw data
% You should observe that the raw and processed data are of comparable quality.
% For comparison, you may wish to generate a PCA reduced image which
% retains only 90% of the variance.

figure(‘name‘,[‘PCA processed images ‘,sprintf(‘(%d / %d dimensions)‘, k, size(x, 1)),‘‘]);
display_network(xHat(:,randsel));
figure(‘name‘,‘Raw images‘);
display_network(x(:,randsel));

%%================================================================
%% Step 4a: Implement PCA with whitening and regularisation
%  Implement PCA with whitening and regularisation to produce the matrix
%  xPCAWhite. 

epsilon = 1e-1; 
%%% YOUR CODE HERE %%%
xPCAWhite = diag(1./sqrt(diag(S) + epsilon)) * xRot;
covar = xPCAWhite*xPCAWhite‘;
%% Step 4b: Check your implementation of PCA whitening 
%  Check your implementation of PCA whitening with and without regularisation. 
%  PCA whitening without regularisation results a covariance matrix 
%  that is equal to the identity matrix. PCA whitening with regularisation
%  results in a covariance matrix with diagonal entries starting close to 
%  1 and gradually becoming smaller. We will verify these properties here.
%  Write code to compute the covariance matrix, covar. 
%
%  Without regularisation (set epsilon to 0 or close to 0), 
%  when visualised as an image, you should see a red line across the
%  diagonal (one entries) against a blue background (zero entries).
%  With regularisation, you should see a red line that slowly turns
%  blue across the diagonal, corresponding to the one entries slowly
%  becoming smaller.

%%% YOUR CODE HERE %%%

% Visualise the covariance matrix. You should see a red line across the
% diagonal against a blue background.
figure(‘name‘,‘Visualisation of covariance matrix‘);
imagesc(covar);

%%================================================================
%% Step 5: Implement ZCA whitening
%  Now implement ZCA whitening to produce the matrix xZCAWhite. 
%  Visualise the data and compare it to the raw data. You should observe
%  that whitening results in, among other things, enhanced edges.

%%% YOUR CODE HERE %%%
xZCAWhite = U * diag(1./sqrt(diag(S) + epsilon)) * U‘ * x;
% Visualise the data, and compare it to the raw data.
% You should observe that the whitened images have enhanced edges.
figure(‘name‘,‘ZCA whitened images‘);
display_network(xZCAWhite(:,randsel));
figure(‘name‘,‘Raw images‘);
display_network(x(:,randsel));

View Code

暂时就这些了，其他的代码看了，写了之后也会写在这里……

UFLDL tutorial 代码分析

标签：style blog http color io os ar for strong

原文地址：http://www.cnblogs.com/sea-wind/p/4042939.html

踩

(0)

评论一句话评论（0）

分享档案

更多>

2021年07月29日 (22)
2021年07月28日 (40)
2021年07月27日 (32)
2021年07月26日 (79)
2021年07月23日 (29)
2021年07月22日 (30)
2021年07月21日 (42)
2021年07月20日 (16)
2021年07月19日 (90)
2021年07月16日 (35)

周排行