python | concurrent.futures模块提升数据处理速度

时间：2018-10-08 21:38:40 阅读：373 评论：0 收藏：0 [点我收藏+]

标签：数据预处理 -o file 设备 for creat split for 循环 sda

在默认情况下，Python 程序是单个进程，使用单 CPU 核心执行。致使多核设备无法利用而算力浪费。通过使用 Python 的 concurrent.futures 模块，可以让一个普通的程序转换成适用于多核处理器并行处理的程序，进而提升数据预处理的效率。

案例：

简单例子，在单个文件夹中有一个图片数据集，其中有数万张图片。在这里，我们决定使用 1000 张。我们希望在所有图片被传递到深度神经网络之前将其调整为 600×600 像素分辨率的形式。

标准方法

import glob
import os
import cv2

### Loop through all jpg files in the current folder 
### Resize each one to size 600x600
for image_filename in glob.glob("*.jpg"):
 ### Read in the image data
　　img = cv2.imread(image_filename)

 ### Resize the image
　　img = cv2.resize(img, (600, 600))

流程：

1. 首先从需要处理内容的文件（或其他数据）列表开始。

2. 使用 for 循环逐个处理每个数据，然后在每个循环迭代上运行预处理。

更快的方法

流程：

1.将 jpeg 文件列表分成 4 个小组；

2.运行 Python 解释器中的 4 个独立实例；

3.让 Python 的每个实例处理 4 个数据小组中的一个；

4.结合四个处理过程得到的结果得出最终结果列表。

import glob
import os
import cv2
import concurrent.futures


def load_and_resize(image_filename):
### Read in the image data
    img = cv2.imread(image_filename)

### Resize the image
    img = cv2.resize(img, (600, 600)) 


### Create a pool of processes. By default, one is created for each CPU in your machine.
with concurrent.futures.ProcessPoolExecutor() as executor:
 ### Get a list of files to process
    image_files = glob.glob("*.jpg")

 ### Process the list of files, but split the work across the process pool to use all CPUs
 ### Loop through all jpg files in the current folder 
 ### Resize each one to size 600x600
    executor.map(load_and_resize, image_files)

「executor.map()」将你想要运行的函数和列表作为输入，列表中的每个元素都是我们函数的单个输入。

但是该方法并不适用于对处理后的结果有特殊顺序要求。

参考：

https://towardsdatascience.com/heres-how-you-can-get-a-2-6x-speed-up-on-your-data-pre-processing-with-python-847887e63be5

python | concurrent.futures模块提升数据处理速度

标签：数据预处理 -o file 设备 for creat split for 循环 sda

原文地址：https://www.cnblogs.com/geo-will/p/9757016.html

踩

(0)

评论一句话评论（0）

分享档案

更多>

2021年07月29日 (22)
2021年07月28日 (40)
2021年07月27日 (32)
2021年07月26日 (79)
2021年07月23日 (29)
2021年07月22日 (30)
2021年07月21日 (42)
2021年07月20日 (16)
2021年07月19日 (90)
2021年07月16日 (35)

周排行