标签:tar poi die 多个 好的 parallel 设置 训练 min
官方网址上的这两句话很好的阐述了Caffe2的特点“A New Lightweight, Modular, and Scalable Deep Learning Framework”、“Code once, run anywhere”。我比较关心的第一个问题是,怎么利用多个GPU加速训练呢?
There are multiple ways to utilize multiple GPUs or machines to train models. Synchronous SGD, using Caffe2’s data parallel model, is the simplest and easiest to understand: each GPU will execute exactly same code to run their share of the mini-batch. Between mini-batches, we average the gradients of each GPU and each GPU executes the parameter update in exactly the same way. At any point in time the parameters have same values on each GPU. Another way to understand Synchronous SGD is that it allows increasing the mini-batch size. Using 8 GPUS to run a batch of 32 each is equivalent to one GPU running a mini-batch of 256.
之前一直不是很明白应用多核GPU加速的原理是什么,读了上述一段话之后,豁然开朗。真的希望自己有一天,也能用寥寥数语将一个复杂概念解释清楚。上述英文的要点可以归结如下(假如我们有八个GPU):
如果在一个GPU上我们设置Batch size为256,在并行训练时,其等效设置为,这八个GPU,每个GPU的Batch size为32。这可以概括为“顺序执行转为并行执行”。
标签:tar poi die 多个 好的 parallel 设置 训练 min
原文地址:http://www.cnblogs.com/everyday-haoguo/p/Caffe2-SyncSGD.html