码迷,mamicode.com
首页 > 其他好文 > 详细

Deep Learning 学习笔记(一)——softmax Regression

时间:2015-02-04 16:23:11      阅读:319      评论:0      收藏:0      [点我收藏+]

标签:

  茫然中不知道该做什么,更看不到希望。

  偶然看到coursera上有Andrew Ng教授的机器学习课程以及他UFLDL上的深度学习课程,于是静下心来,视频一个个的看,作业一个一个的做,程序一个一个的写。N多数学的不懂、Matlab不熟悉,开始的时候学习进度慢如蜗牛,坚持了几个月,终于也学完了。为了避免遗忘,在这里记下一些内容。由于水平有限,Python也不是太熟悉,英语也不够好,有错误或不当的地方,请不吝赐教。

 

  对于softmax背后的理论还不是很清楚,不知道是来自信息论还是概率。不过先了解个大概,先用起来,背后的理论再慢慢补充。

  softmax的基本理论:

  对于给定的输入x和输出有,K类分类器每类的概率为P(y=k|x,Θ),即

  技术分享

  模型参数 θ(1),θ(2),,θ(K)Rn ,矩阵θ以K*n的形式比较方便(其中n为输入x的维度或特征数)。

  softmax回归的代价函数:

  技术分享

  其中1{y(i)=k}为指示函数,即y(i)为k时其值为1,否则为0,或则说括号内的表达式为真时其值为1,否则为0

  

  梯度公式:

  技术分享

  

  在实现此模型时遇到了2个问题,卡了一段时间:

    1. 指示函数如何实现?我的实现方法:把y转换为一个k个元素的向量yv,如果y=i,则yv[i]=1,其他位置为零。在代价函数中用这个向量和概率P元素相乘、在梯度公式中与概率P相减即可实现指示函数。

    2. 对矩阵不太熟练,矢量化花费了不少时间。

  

  对概率P的参数θ进行平移得到结果和原概率一致,因此可得到参数θ是有冗余的结论。解决方法有2种,第一种是在代价函数和梯度中加上L2范式惩罚项,这种方式又增加了一个自由参数:惩罚项系数。第二种方式是固定某个类的参数为零,这样的方式不影响最终的分类结果。在我的实现方式里使用第二种方式。

  教程里提到的梯度检验的方法非常有效,可以有效验证代价函数和梯度实现是否正确。只要通过梯度检验,一般都能得到正确的结果。

  UFLDL教程上的练习是Matlab,由于对Matlab熟悉度不够,我使用Python+numpy+scipy来实现。

  第一段代码定义了一个单一神经网络层NNLayer,它在多层神经网络中用得到。第二段代码是softmax回归的实现,它从NNLayer继承。

技术分享
  1 import numpy as np
  2 import math
  3 class NNLayer(object):
  4     ‘‘‘
  5     This class is single layer of Neural network 
  6     ‘‘‘
  7     def __init__(self, inputSize,outputSize,Lambda,actFunc=sigmoid):
  8         ‘‘‘
  9         Constructor: initialize one layer w.r.t params
 10         parameters : 
 11             inputSize         - the number of input elements
 12             outputSize        - the number of output
 13             lambda            - weight decay parameter
 14             actFunc        - the can be sigmoid,tanh,rectified linear function
 15         ‘‘‘
 16         self.inputSize = inputSize
 17         self.outputSize = outputSize
 18         self.Lambda = Lambda        
 19         self.actFunc=sigmoid
 20         self.actFuncGradient=sigmodGradient
 21         
 22         self.activation=0
 23         if actFunc==sigmoid:  
 24             self.actFunc =  sigmoid
 25             self.actFuncGradient = sigmodGradient        
 26         if actFunc==tanh:            
 27             self.actFunc =  tanh
 28             self.actFuncGradient =tanhGradient
 29         if actFunc==rectfiedLinear:            
 30             self.actFunc =  rectfiedLinear  
 31             self.actFuncGradient =  rectfiedLinearGradient
 32         
 33         #epsilon的值是一个经验公式  
 34         #initialize weights and intercept (bias)
 35         epsilon_init = 2.4495/math.sqrt(self.inputSize+self.outputSize)
 36         theta = np.random.rand(self.outputSize, self.inputSize + 1) * 2 * epsilon_init - epsilon_init
 37         theta = theta*0.0001
 38         ‘‘‘
 39         W=theta.reshape(self.outputSize,-1)
 40         self.b=W[:,-1].reshape(self.outputSize,1)
 41         self.W = W[:,:-1] 
 42         ‘‘‘ 
 43         NNLayer.rebuildTheta(self, theta)       #prevent the overwriting method
 44         
 45     def setWeight(self,Weight,b):
 46         self.W = Weight
 47         self.b =b
 48     
 49     def getWeight(self):
 50         return self.W,self.b
 51     
 52     def flatTheta(self):
 53         W = np.hstack((self.W, self.b))
 54         return W.ravel() 
 55     
 56     def rebuildTheta(self,theta):
 57         ‘‘‘
 58         convert 1-dim theta to weight and intercept
 59         Parameters:
 60             theta    - The vector hold the weights and intercept, needed by scipy.optimize function
 61                        size:outputSize*inputSize
 62         ‘‘‘
 63         W=theta.reshape(self.outputSize,-1)
 64         self.b=W[:,-1].reshape(self.outputSize,1)   #bias b is a vector with outputSize elements
 65         self.W = W[:,:-1]                           #weights w is a matrix with outputSize by inputSize dimension
 66         
 67     def forward(self,X):
 68         ‘‘‘
 69         Parameters:
 70             X -  The examples in a matrix, 
 71                 it‘s dimensionality is inputSize by numSamples
 72         ‘‘‘  
 73         Z = np.dot(self.W,X)+self.b     #Z        
 74         self.activation= self.actFunc(Z)             #activations
 75         return self.activation
 76     
 77     def backpropagate(self,inputMat,delta):
 78         ‘‘‘
 79         parameter:
 80             inputMat - the actviations of previous layer, or input of this layer,
 81                          inputSize by numSamples
 82             delta - the next layer error term, outputSize by numSamples
 83         
 84         assume current layer number is l,
 85         delta is the error term of layer l+1.
 86         delta(l) = (W(l).T*delta(l+1)).f‘(z)
 87         If this layer is the first hidden layer,this method should not
 88         be called
 89         The f‘ is re-writed to void the second call to the activation function
 90         ‘‘‘
 91         return np.dot(self.W.T,delta)*self.actFuncGradient(inputMat)
 92     
 93     def gradient(self,inputMat,delta):
 94         ‘‘‘
 95         grad_W(l)=delta(l+1)*inputMat
 96         grad_b(l) = SIGMA(delta(l+1))
 97         parameters:
 98             inputMat - input of this layer, inputSize by numSamples
 99             delta    - the next layer error term
100         ‘‘‘
101         m=inputMat.shape[0]
102         gw = np.dot(delta,inputMat.T)/m
103         gb = np.sum(delta,1)/m
104         #combine gradients of weights and intercepts
105         #and flat it
106         grad = np.hstack((gw, gb.reshape(-1,1)))
107          
108         return np.ravel(grad)
109         
110     
111 def sigmoid(Z):
112     return 1.0 /(1.0 + np.exp(-Z))
113 
114 def sigmodGradient (a):
115     #a = sigmoid(Z)
116     return a*(1-a)
117 
118 def tanh(Z):
119     e1=np.exp(Z)
120     e2=np.exp(-Z)
121     return (e1-e2)/(e1+e2)
122 
123 def tanhGradient(a):
124     return 1-a**2
125 
126 def rectfiedLinear(Z):
127     a = np.zeros(Z.shape)+Z
128     a[a<0]=0
129     return a
130 
131 def rectfiedLinearGradient(a):
132     b = np.zeros(a.shape)+a    
133     b[b>0]=1
134     return b

 

  1 import numpy as np
  2 import scipy.optimize as spopt
  3 
  4 
  5 class SoftmaxRegression(NNLayer):
  6     ‘‘‘
  7     We assume the last class weight to be zeros in this implementation.
  8     The weight decay is not used here.
  9 
 10     ‘‘‘
 11     def __init__(self, numFeatures, numClasses,Lambda=0):
 12         ‘‘‘
 13         Initialization of weights,intercepts and other members 
 14         Parameters:
 15             numClasses    - The number of classes to be classified
 16             X             - The training samples, numFeatures by numSamples
 17             y             - The labels of training samples, numSamples elements vector
 18         ‘‘‘      
 19 
 20         # call the super constructor to initialize the weights and intercepts
 21         # We do not need the last weights and intercepts of the last class
 22         super().__init__(numFeatures, numClasses - 1, Lambda, None)
 23         
 24         self.delta=0  
 25         self.numSamples=0
 26         #self.X=0
 27         self.inputMat=0
 28         self.y_mat=0                 
 29         
 30     def predict(self, Xtest):
 31         ‘‘‘
 32         Prediction.
 33         Before calling this method, this model should be training
 34         Parameter:
 35             Xtest    - The data to be predicted, numFeatures by numData
 36         ‘‘‘
 37         Z = np.dot(self.W, Xtest) + self.b
 38         #add the prediction of the last class,they are all zeros
 39         lastClass = np.zeros((1, Xtest.shape[1]))
 40         Z = np.vstack((Z, lastClass))
 41         #get the index of max value in each column, it is the prediction
 42         return np.argmax(Z, 0)
 43     
 44     def forward(self):
 45         ‘‘‘
 46         get the matrix of softmax hypothesis
 47         this method  will be called by cost and gradient methods
 48         Parameters:
 49             
 50         ‘‘‘
 51         h = np.dot(self.W, self.inputMat) + self.b
 52         h = np.exp(h)
 53         #add probabilities of the last class, they are all ones 
 54         h = np.vstack((h, np.ones((1, self.numSamples))))
 55         #The probability of all classes
 56         hsum = np.sum(h, axis=0)
 57         #get the probability of each class
 58         self.activation = h / hsum
 59         return self.activation
 60 
 61 
 62     def cost(self, theta):
 63         ‘‘‘
 64         The cost function.
 65         Parameters:
 66             theta    - The vector hold the weights and intercept, needed by scipy.optimize function
 67                        size: (numClasses - 1)*(numFeatures + 1)        
 68         ‘‘‘
 69         self.rebuildTheta(theta)
 70         h = np.log(self.forward())
 71         #h * self.y_mat, apply the indicator function
 72         cost = -np.sum(h * self.y_mat, axis=(0, 1))
 73         
 74         return cost / self.numSamples
 75     
 76     def gradient(self, theta):
 77         ‘‘‘
 78         The gradient function.
 79         Parameters:
 80             theta    - The vector hold the weights and intercept, needed by scipy.optimize function
 81                        size: (numClasses - 1)*(numFeatures + 1)        
 82         ‘‘‘
 83         self.rebuildTheta(theta)
 84         h =self.forward()
 85         #delta = -(self.y_mat-h)
 86         self.delta = h - self.y_mat
 87         self.delta=self.delta[:-1, :]
 88         #get the gradient
 89         grad = super().gradient(self.inputMat, self.delta)
 90                
 91         return grad
 92     
 93     def setTrainData(self,X,y):
 94         ‘‘‘
 95         Using the training samples and labels to train the model.
 96         We use function fmin_cg in scipy.optimize
 97         Parameters:
 98             X             - The training samples, numFeatures by numSamples
 99             y             - The labels of training samples, numSamples elements vector
100         ‘‘‘
101         self.setTrainingSamples(X)
102         self.setTrainingLabels(y)
103         
104     def setTrainingSamples(self,X):
105         #self.X = X
106         self.inputMat=X
107         self.numSamples = X.shape[1]
108         
109     def setTrainingLabels(self,y):
110         # convert Vector y to a matrix y_mat.
111         # For sample i, if it belongs to the k-th class, 
112         # y_mat[k,i]=1 (k==j), y_mat[k,i]=0 (k!=j)        
113         y = y.astype(np.int64)
114         yy = np.arange(self.numSamples)
115         self.y_mat = np.zeros((self.outputSize+1, self.numSamples))          
116         self.y_mat[y, yy] = 1      
117     
118     def train(self,X,y): 
119         self.setTrainData(X,y)
120                       
121         theta =self.flatTheta()
122         
123         #ret = spopt.fmin_l_bfgs_b(self.cost, theta, fprime=self.gradient,m=200,disp=1, maxiter=100)
124                 
125         #self.rebuildTheta(ret[0])
126         opttheta = spopt.fmin_cg(self.cost, theta, fprime=self.gradient,full_output=False,disp=True, maxiter=100)
127         self.rebuildTheta(opttheta)
128         
129     def performance(self,Xtest,ytest):
130         ‘‘‘
131         Before calling this method, this model should be training
132         Parameter:
133             Xtest    - The data to be predicted, numFeatures by numData
134         ‘‘‘    
135         pred = self.predict(Xtest)   
136         return np.mean(pred == ytest) * 100
137 
138 def checkGradient(X,y):
139         
140     sm = SoftmaxRegression(X.shape[0], 10)
141     #W = np.hstack((sm.W, sm.b))
142     sm.setTrainData(X, y)
143     theta = sm.flatTheta()    
144     grad = sm.gradient(theta)
145         
146     numgrad = np.zeros(grad.shape)
147     
148     e = 1e-6
149     
150     for i in range(np.size(grad)):         
151         theta[i]=theta[i]-e
152         loss1 =sm.cost(theta)
153         theta[i]=theta[i]+2*e
154         loss2 = sm.cost(theta)
155         theta[i]=theta[i]-e            
156         
157         numgrad[i] = (-loss1 + loss2) / (2 * e)
158         
159     print(np.sum(np.abs(grad-numgrad))/np.size(grad))    

测试数据使用MNIST数据集。测试结果,正确率在92.5%左右。

测试代码:

 

 1     X = np.load(../../common/trainImages.npy) / 255 3     y = np.load(../../common/trainLabels.npy)
 4     ‘‘‘    
 5     X1=X[:,:10]
 6     y1=y[:10]
 7     checkGradient(X1,y1)
 8     ‘‘‘    
 9     Xtest = np.load(../../common/testImages.npy) / 25511     ytest = np.load(../../common/testLabels.npy)
12     sm = SoftmaxRegression(X.shape[0], 10)
13     t0=time()
14     sm.train(X,y)
15     print(training Time %.5f s %(time()-t0))
16 
17     print(test acc :%.3f%% % (sm.performance(Xtest,ytest)))

 

 

 

参考资料:

 

Softmax Regression http://ufldl.stanford.edu/tutorial/supervised/SoftmaxRegression/

Deep Learning 学习笔记(一)——softmax Regression

标签:

原文地址:http://www.cnblogs.com/arsenicer/p/4272522.html

(0)
(0)
   
举报
评论 一句话评论(0
登录后才能评论!
© 2014 mamicode.com 版权所有  联系我们:gaon5@hotmail.com
迷上了代码!