【华为云技术分享】Batch Normalization (BN) 介绍

时间：2020-03-19 17:44:47 阅读：108 评论：0 收藏：0 [点我收藏+]

标签：codec weight put clear 了解 icloud hat 华为云 orm

Batch Normalization（BN）解决的是Internal Covariate Shift （ICS）的问题。

1 Internal Covariate Shift在文中定义为
2 The change in the distribution of network activations due to the change in network parameters during training.

也就是在训练的过程中因为网络参数改变引起网络各层输出的分布改变。

Internal Covariate Shift分为两个部分，Internal和Covariate Shift。

Covariate Shift。Covariate Shift 指在有监督学习中，对训练数据集 $技术图片$ 和测试数据集 $技术图片$ ，边际分布不一致，即 $技术图片$ ，但条件分布一致 $技术图片$ 。通过domain adaptation的方法解决。参考 https://www.quora.com/What-is-Covariate-shift
Internal。注意在网络中前一层的输出是下一层的输入。对某一中间层而言，在训练的过程中，因为前面层的参数不断变化，输出分布变化，对该层的输入分布也不断变化。

对某一中间层而言，输入的分布不断改变，需要不断适应新的分布，学习效率也就不高。

BN希望固定网络的各层输入的分布，以加快训练速度。

白化（whitening）模型的输入可以使得训练收敛更快。神经网络时多层的结构，考虑白化神经网络中各层的输入。

对mini-batch做Normalization

对整个数据集做白化复杂度很高，BN做了两个必要的化简：

instead of whitening the features in layer inputs and outputs jointly, we will normalize each scalar feature independently, by making it have the mean of zero and the variance of 1.
since we use mini-batches in stochastic gradient training, each mini-batch produces estimates of the mean and variance of each activation.

对一个d维的输入x，对每一维分别标准化。比如对第k维

技术图片，这里对均值和方差都是从mini-batch求的。

注意到简单地对某个层的输入做标准化会改变这个层地表示。为了解决这个问题，BN对每个激活值引入两个参数，分别做scale和shift，思想是allows the BN transform to represent the identity transformation and preserves the network capacity
（感觉类似于resnet）。即