码迷,mamicode.com
首页 > 其他好文 > 详细

cs-231n-back propagation-5

时间:2017-03-16 21:33:47      阅读:174      评论:0      收藏:0      [点我收藏+]

标签:com   nbsp   time   inpu   pre   技术分享   ons   input   9.png   

Unintuitive effects and their consequences. Notice that if one of the inputs to the multiply gate is very small and the other is very big, then the multiply gate will do something slightly unintuitive: it will assign a relatively huge gradient to the small input and a tiny gradient to the large input. Note that in linear classifiers where the weights are dot producted 技术分享技术分享技术分享技术分享wTxi (multiplied) with the inputs, this implies that the scale of the data has an effect on the magnitude of the gradient for the weights. For example, if you multiplied all input data examples 技术分享技术分享xi by 1000 during preprocessing, then the gradient on the weights will be 1000 times larger, and you’d have to lower the learning rate by that factor to compensate. This is why preprocessing matters a lot, sometimes in subtle ways! And having intuitive understanding for how the gradients flow can help you debug some of these cases.

 

学习vertorized:Erik Learned-Miller has also written up a longer related document on taking matrix/vector derivatives which you might find helpful. Find it here.

cs-231n-back propagation-5

标签:com   nbsp   time   inpu   pre   技术分享   ons   input   9.png   

原文地址:http://www.cnblogs.com/kprac/p/6561487.html

(0)
(0)
   
举报
评论 一句话评论(0
登录后才能评论!
© 2014 mamicode.com 版权所有  联系我们:gaon5@hotmail.com
迷上了代码!