标签:
Popular Training Approaches of DNNs?—?A Quick Overview
Rectified linear units improve restricted boltzmann machines (ReLU)
Rectifier Nonlinearities Improve Neural Network Acoustic Models (leaky-ReLU, aka LReLU)
Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification (PReLU)
Empirical Evaluation of Rectified Activations in Convolutional Network (ReLU/LReLU/PReLU/RReLU)
Deep Learning with S-shaped Rectified Linear Activation Units (SReLU)
Parametric Activation Pools greatly increase performance and consistency in ConvNets
Noisy Activation Functions
An Explanation of Xavier Initialization
Deep Neural Networks with Random Gaussian Weights: A Universal Classification Strategy?
All you need is a good init
Data-dependent Initializations of Convolutional Neural Networks
What are good initial weights in a neural network?
RandomOut: Using a convolutional gradient norm to win The Filter Lottery
Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift(ImageNet top-5 error: 4.82%)
Weight Normalization: A Simple Reparameterization to Accelerate Training of Deep Neural Networks
Normalization Propagation: A Parametric Technique for Removing Internal Covariate Shift in Deep Networks
The Loss Surfaces of Multilayer Networks
On Optimization Methods for Deep Learning
On the importance of initialization and momentum in deep learning
Invariant backpropagation: how to train a transformation-invariant neural network
A practical theory for designing very deep convolutional neural network
Stochastic Optimization Techniques
Alec Radford’s animations for optimization algorithms
http://www.denizyuret.com/2015/03/alec-radfords-animations-for.html
Faster Asynchronous SGD (FASGD)
An overview of gradient descent optimization algorithms (★★★★★)
Exploiting the Structure: Stochastic Gradient Methods Using Raw Clusters
Writing fast asynchronous SGD/AdaGrad with RcppParallel
DisturbLabel: Regularizing CNN on the Loss Layer [University of California & MSR] (2016)
Improving neural networks by preventing co-adaptation of feature detectors (Dropout)
Regularization of Neural Networks using DropConnect
Regularizing neural networks with dropout and with DropConnect
Fast dropout training
Dropout as data augmentation
A Theoretically Grounded Application of Dropout in Recurrent Neural Networks
Improved Dropout for Shallow and Deep Learning
Fitting a model via closed-form equations vs. Gradient Descent vs Stochastic Gradient Descent vs Mini-Batch Learning. What is the difference?(Normal Equations vs. GD vs. SGD vs. MB-GD)
http://sebastianraschka.com/faq/docs/closed-form-vs-gd.html
An Introduction to Gradient Descent in Python
Train faster, generalize better: Stability of stochastic gradient descent
A Variational Analysis of Stochastic Gradient Algorithms
The vanishing gradient problem: Oh no?—?an obstacle to deep learning!
Gradient Descent For Machine Learning
http://machinelearningmastery.com/gradient-descent-for-machine-learning/
Revisiting Distributed Synchronous SGD
Acceleration of Deep Neural Network Training with Resistive Cross-Point Devices
DataAugmentation ver1.0: Image data augmentation tool for training of image recognition algorithm
Caffe-Data-Augmentation: a branc caffe with feature of Data Augmentation using a configurable stochastic combination of 7 data augmentation techniques
Scalable and Sustainable Deep Learning via Randomized Hashing
pastalog: Simple, realtime visualization of neural network training performance
torch-pastalog: A Torch interface for pastalog - simple, realtime visualization of neural network training performance
标签:
原文地址:http://www.cnblogs.com/hansjorn/p/5396677.html