pytorch加速加载方案

时间：2019-12-08 12:50:08 阅读：214 评论：0 收藏：0 [点我收藏+]

标签：用法 file nat change net for tmp 没有 elf

pytorch没有像mxnet的RecordIO文件，每次读大量小图很是吃力，硬盘不给力的话耗时基本堵在加载数据上了，试过lmdb，快则快矣，然不支持训练过程中随机shuffle，终放弃。

盖天下苦其久矣，早有各路神仙献策，此地有个简单汇总，遂拾人牙慧，总结如下：

1. dali - 当前最靠谱方案，连数据增广也提供了

参考链接pip install安装，根据手册写几行代码就可以了，跟pytorch自带的dataloader用法基本一样，支持数据增广，常用的都有了，需要搜索函数名看好具体用法，需要用dali自带的Uniform和CoinFlip做随机，如果用python自带random生成随机数，然后if _rand == 1的话只能生成一次，也就是说如果random生成了1那对于本次训练所有数据都做增广，而生成0就都不进行增广，这地方调了一个晚上，最后好好看手册里说明和git issue修改如下，感觉随机性可以了。

class reader_pipeline(Pipeline):
    def __init__(self, image_dir, batch_size, num_threads, device_id):
        super(reader_pipeline, self).__init__(batch_size, num_threads, device_id)
        self.input = dali_ops.FileReader(file_root = image_dir, random_shuffle = False)
        self.decode = dali_ops.ImageDecoder(device = ‘mixed‘, output_type = dali_types.RGB)
       
        self.cmn_img = dali_ops.CropMirrorNormalize(device = "gpu",
                                           crop=(112, 112),  crop_pos_x=0, crop_pos_y=0,
                                           output_dtype = dali_types.FLOAT, image_type=dali_types.RGB,
                                           mean=[0.5*255, 0.5*255, 0.5*255],
                                           std=[0.5*255, 0.5*255, 0.5*255]
                                           )
       
        self.brightness_change = dali_ops.Uniform(range=(0.6,1.4))
        self.rd_bright = dali_ops.Brightness(device="gpu")
        self.contrast_change = dali_ops.Uniform(range=(0.6,1.4))
        self.rd_contrast = dali_ops.Contrast(device = "gpu")
        self.saturation_change = dali_ops.Uniform(range=(0.6,1.4))
        self.rd_saturation = dali_ops.Saturation(device = "gpu")
        self.jitter_change = dali_ops.Uniform(range=(1,2))
        self.rd_jitter = dali_ops.Jitter(device = "gpu")
        self.disturb = dali_ops.CoinFlip(probability=0.3)       #以0.3的概率生成1， 0.3的概率生成0
        self.hue_change = dali_ops.Uniform(range = (-30,30))    #以-30,30之间的随机数
        self.hue = dali_ops.Hue(device = "gpu")
       
    def define_graph(self):
        jpegs, labels = self.input(name="Reader")
        images = self.decode(jpegs)
        brightness = self.brightness_change()
        images = self.rd_bright(images, brightness=brightness)
        contrast = self.contrast_change()
        images = self.rd_contrast(images, contrast = contrast)
        saturation = self.saturation_change()
        images = self.rd_saturation(images, saturation = saturation)
        jitter = self.jitter_change()
        disturb = self.disturb()
        images = self.rd_jitter(images, mask = disturb)
        hue = self.hue_change()
        images = self.hue(images, hue = hue)
       
        imgs = self.cmn_img(images)
        return (imgs, labels)

报错解决方案：

1.1. AttributeError: module ‘nvidia.dali.ops‘ has no attribute ‘ImageDecoder‘是版本没有装对

import torch
torch.version.cuda

根据打印版本信息选择对应的安装

1.2. RuntimeError: CUDA error: an illegal memory access was encountered
terminate called after throwing an instance of ‘dali::CUDAError‘
what(): CUDA runtime API error cudaErrorIllegalAddress (77):
an illegal memory access was encountered

很诡异，batchsize=120没出现过，改大到240就报这个错，去掉数据增强，num_workers从1改成2，这个报错就没了。加上数据增强，num_workers怎么改都不行，应该是dali的bug，貌似还没有解决。

1.3 按这里描述，dali是支持分类样本以list作为输入的，但是自己用的时候还是报错，不确定是否list格式不对。

2. jpeg4py解码，ubuntu上大概能加速30%，centos安装失败

def pil_loader(path):
    with open(path, ‘rb‘) as f:
        img = Image.open(f)
        return img.convert(‘RGB‘)
def jpeg4py_loader(path):
    with open(path, ‘rb‘) as f:
        img = jpeg.JPEG(f).decode()
        return Image.fromarray(img) 

def __getitem__(self, index):
        path, target = self.samples[index]
        img = pil_loader(path)
        # img = jpeg4py_loader(path)
        if self.transform is not None:
            img = self.transform(img)
        return img, int(target)

参考git安装即可，代码改动也比较小，如果用ubuntu机器是比较划算的方案。

3. 内存当硬盘，不明显，但操作起来比较简单，内存空间大的话就用吧

sudo mount -t tmpfs -o size=100g tmpfs /data03/xxx/tmp_data

这个操作需要先分出一块区域，再向tmp_data内拷贝数据，不能对一个有数据的文件夹执行这个命令，否则数据就全丢了......

4. 数据预取，不明显，没有试出作者说的打鸡血效果.....

5. 手写多线程加载数据，陷进去好几天，终放弃......

之前写过一次性多线程加载所有训练数据到内存的，当时训练数据少（2万），一次加载所有内存也能撑住，然后训练速度飞起，gpu完全不浪费。当时也就花了一天，想想改成在线加载也不难，不想困住了很久，现象就是多线程里返回torch tensor会崩溃，可能是自己工程能力太挫了吧，不搞了。

有些存储不支持存放大量散文件，也许最终还是应该寻觅一种类似RecordIO/TFRecord的方式......

pytorch加速加载方案

标签：用法 file nat change net for tmp 没有 elf

原文地址：https://www.cnblogs.com/zhengmeisong/p/11995374.html

踩

(0)

评论一句话评论（0）

分享档案

更多>

2021年07月29日 (22)
2021年07月28日 (40)
2021年07月27日 (32)
2021年07月26日 (79)
2021年07月23日 (29)
2021年07月22日 (30)
2021年07月21日 (42)
2021年07月20日 (16)
2021年07月19日 (90)
2021年07月16日 (35)

周排行