码迷,mamicode.com
首页 > 其他好文 > 详细

BP, Gradient descent and Generalisation

时间:2016-10-25 16:40:54      阅读:167      评论:0      收藏:0      [点我收藏+]

标签:nbsp   ring   分享   ges   measure   ons   ror   tee   work   

For each training pattern presented to a multilayer neural network, we can computer the error:

yd(p)-y(p)

Sum-Squared Error squaring and summing across all n patterns, SSE give a good measure of the overall performance of the network. 

SSE depends on weight and threshholds.

技术分享


Back-propagation

Back-propagation is a "gradient descent" training algorithm

技术分享

技术分享

Step:

1. Calculate error for a single patternm

2. Compute weight changes that will make the greatest change in error with error gradient(steepest slope)

only possible with differentiable activation functions(e.g. sigmoid)

技术分享技术分享

gradient descent only approximate training proceeds pattern-by pattern.

gradient descent may not always reach true global error minimum, otherwise it may get stuck in "local" minimum.

技术分享

solution: momentum term

技术分享

 

BP, Gradient descent and Generalisation

标签:nbsp   ring   分享   ges   measure   ons   ror   tee   work   

原文地址:http://www.cnblogs.com/eli-ayase/p/5906175.html

(0)
(0)
   
举报
评论 一句话评论(0
登录后才能评论!
© 2014 mamicode.com 版权所有  联系我们:gaon5@hotmail.com
迷上了代码!