ReLU 和sigmoid 函数对比

时间：2016-07-24 22:30:05 阅读：734 评论：0 收藏：0 [点我收藏+]

标签：

详细对比请查看：http://www.zhihu.com/question/29021768/answer/43517930

. 激活函数的作用：

是为了增加神经网络模型的非线性。否则你想想，没有激活函数的每层都相当于矩阵相乘。就算你叠加了若干层之后，无非还是个矩阵相乘罢了。所以你没有非线性结构的话，根本就算不上什么神经网络。

2. 为什么ReLU效果好：

重点关注这章6.6节：Piecewise Linear Hidden Units
http://www.iro.umontreal.ca/~bengioy/dlbook/mlp.html

总结如下：
发现ReLU效果显著的论文：

Jarrett, K., Kavukcuoglu, K., Ranzato, M., and LeCun, Y. (2009a). What is the best multi-stage architecture for object recognition?

发现ReLU更容易学习优化。因为其分段线性性质，导致其前传，后传，求导都是分段线性。而传统的sigmoid函数，由于两端饱和，在传播过程中容易丢弃信息：

Glorot, X., Bordes, A., and Bengio, Y. (2011b). Deep sparse rectifier neural networks. In JMLR W&CP: Proceedings of the Fourteenth International Conference on Artificial Intelligence and Statistics (AISTATS 2011). 130, 297

缺点是不能用Gradient-Based方法。同时如果de-active了，容易无法再次active。不过有办法解决，使用maxout激活函数：

Goodfellow, I. J., Warde-Farley, D., Mirza, M., Courville, A., and Bengio, Y. (2013a). Maxout networks. In S. Dasgupta and D. McAllester, editors, ICML’13, pages 1319–1327. 130, 152, 243

除了帮助传播信息，便于优化的优点以外，分段线性函数可以让regularize变得更加容易。

ReLU 和sigmoid 函数对比

标签：

原文地址：http://www.cnblogs.com/alexanderkun/p/5701694.html

踩

(0)

评论一句话评论（0）

分享档案

更多>

2021年07月29日 (22)
2021年07月28日 (40)
2021年07月27日 (32)
2021年07月26日 (79)
2021年07月23日 (29)
2021年07月22日 (30)
2021年07月21日 (42)
2021年07月20日 (16)
2021年07月19日 (90)
2021年07月16日 (35)

周排行