标签:
Jarrett, K., Kavukcuoglu, K., Ranzato, M., and LeCun, Y. (2009a). What is the best multi-stage architecture for object recognition?
Glorot, X., Bordes, A., and Bengio, Y. (2011b). Deep sparse rectifier neural networks. In JMLR W&CP: Proceedings of the Fourteenth International Conference on Artificial Intelligence and Statistics (AISTATS 2011). 130, 297
缺点是不能用Gradient-Based方法。同时如果de-active了,容易无法再次active。不过有办法解决,使用maxout激活函数:
Goodfellow, I. J., Warde-Farley, D., Mirza, M., Courville, A., and Bengio, Y. (2013a). Maxout networks. In S. Dasgupta and D. McAllester, editors, ICML’13, pages 1319–1327. 130, 152, 243
除了帮助传播信息,便于优化的优点以外,分段线性函数可以让regularize变得更加容易。
标签:
原文地址:http://www.cnblogs.com/alexanderkun/p/5701694.html