码迷,mamicode.com
首页 > 其他好文 > 详细

Sparse Autoencoder(二)

时间:2014-09-18 16:28:14      阅读:305      评论:0      收藏:0      [点我收藏+]

标签:des   style   http   io   os   ar   strong   for   div   

Gradient checking and advanced optimization

In this section, we describe a method for numerically checking the derivatives computed by your code to make sure that your implementation is correct. Carrying out the derivative checking procedure described here will significantly increase your confidence in the correctness of your code.

Suppose we want to minimize bubuko.com,布布扣 as a function of bubuko.com,布布扣. For this example, suppose bubuko.com,布布扣, so that bubuko.com,布布扣. In this 1-dimensional case, one iteration of gradient descent is given by

bubuko.com,布布扣

Suppose also that we have implemented some function bubuko.com,布布扣 that purportedly computes bubuko.com,布布扣, so that we implement gradient descent using the update bubuko.com,布布扣.

 

Recall the mathematical definition of the derivative as

bubuko.com,布布扣

Thus, at any specific value of bubuko.com,布布扣, we can numerically approximate the derivative as follows:

bubuko.com,布布扣

 

Thus, given a function bubuko.com,布布扣 that is supposedly computing bubuko.com,布布扣, we can now numerically verify its correctness by checking that

bubuko.com,布布扣

The degree to which these two values should approximate each other will depend on the details of bubuko.com,布布扣. But assuming bubuko.com,布布扣, you‘ll usually find that the left- and right-hand sides of the above will agree to at least 4 significant digits (and often many more).

 

Suppose we have a function bubuko.com,布布扣 that purportedly computes bubuko.com,布布扣; we‘d like to check if bubuko.com,布布扣 is outputting correct derivative values. Let bubuko.com,布布扣, where

bubuko.com,布布扣

is the bubuko.com,布布扣-th basis vector (a vector of the same dimension as bubuko.com,布布扣, with a "1" in the bubuko.com,布布扣-th position and "0"s everywhere else). So, bubuko.com,布布扣 is the same as bubuko.com,布布扣, except its bubuko.com,布布扣-th element has been incremented by EPSILON. Similarly, let bubuko.com,布布扣 be the corresponding vector with the bubuko.com,布布扣-th element decreased by EPSILON. We can now numerically verify bubuko.com,布布扣‘s correctness by checking, for each bubuko.com,布布扣, that:

bubuko.com,布布扣

 

参数为向量,为了验证每一维的计算正确性,可以控制其他变量

When implementing backpropagation to train a neural network, in a correct implementation we will have that

bubuko.com,布布扣

This result shows that the final block of psuedo-code in Backpropagation Algorithm is indeed implementing gradient descent. To make sure your implementation of gradient descent is correct, it is usually very helpful to use the method described above to numerically compute the derivatives of bubuko.com,布布扣, and thereby verify that your computations of bubuko.com,布布扣 andbubuko.com,布布扣 are indeed giving the derivatives you want.

 

Autoencoders and Sparsity

 

Anautoencoder neural network is an unsupervised learning algorithm that applies backpropagation, setting the target values to be equal to the inputs. I.e., it uses bubuko.com,布布扣.

Here is an autoencoder:

bubuko.com,布布扣

 

we will write bubuko.com,布布扣 to denote the activation of this hidden unit when the network is given a specific input bubuko.com,布布扣. Further, let

bubuko.com,布布扣

be the average activation of hidden unit bubuko.com,布布扣 (averaged over the training set). We would like to (approximately) enforce the constraint

bubuko.com,布布扣

where bubuko.com,布布扣 is a sparsity parameter, typically a small value close to zero (say bubuko.com,布布扣). In other words, we would like the average activation of each hidden neuron bubuko.com,布布扣 to be close to 0.05 (say). To satisfy this constraint, the hidden unit‘s activations must mostly be near 0.

To achieve this, we will add an extra penalty term to our optimization objective that penalizes bubuko.com,布布扣 deviating significantly from bubuko.com,布布扣. Many choices of the penalty term will give reasonable results. We will choose the following:

bubuko.com,布布扣

Here, bubuko.com,布布扣 is the number of neurons in the hidden layer, and the index bubuko.com,布布扣 is summing over the hidden units in our network. If you are familiar with the concept of KL divergence, this penalty term is based on it, and can also be written

bubuko.com,布布扣

 

Our overall cost function is now

bubuko.com,布布扣

where bubuko.com,布布扣 is as defined previously, and bubuko.com,布布扣 controls the weight of the sparsity penalty term. The term bubuko.com,布布扣 (implicitly) depends on bubuko.com,布布扣 also, because it is the average activation of hidden unit bubuko.com,布布扣, and the activation of a hidden unit depends on the parameters bubuko.com,布布扣.

 

bubuko.com,布布扣

 

Visualizing a Trained Autoencoder

 

Consider the case of training an autoencoder on bubuko.com,布布扣 images, so that bubuko.com,布布扣. Each hidden unit bubuko.com,布布扣 computes a function of the input:

bubuko.com,布布扣

We will visualize the function computed by hidden unit bubuko.com,布布扣---which depends on the parameters bubuko.com,布布扣 (ignoring the bias term for now)---using a 2D image. In particular, we think of bubuko.com,布布扣 as some non-linear feature of the input bubuko.com,布布扣

 

If we suppose that the input is norm constrained by bubuko.com,布布扣, then one can show (try doing this yourself) that the input which maximally activates hidden unit bubuko.com,布布扣 is given by setting pixel bubuko.com,布布扣 (for all 100 pixels, bubuko.com,布布扣) to

bubuko.com,布布扣

By displaying the image formed by these pixel intensity values, we can begin to understand what feature hidden unit bubuko.com,布布扣 is looking for.

对一幅图像进行Autoencoder ,前面的隐藏结点一般捕获的是边缘等初级特征,越靠后隐藏结点捕获的特征语义更深。

Sparse Autoencoder(二)

标签:des   style   http   io   os   ar   strong   for   div   

原文地址:http://www.cnblogs.com/sprint1989/p/3979296.html

(0)
(0)
   
举报
评论 一句话评论(0
登录后才能评论!
© 2014 mamicode.com 版权所有  联系我们:gaon5@hotmail.com
迷上了代码!