码迷,mamicode.com
首页 > 其他好文 > 详细

爬虫之由性能说起

时间:2018-04-01 10:45:46      阅读:198      评论:0      收藏:0      [点我收藏+]

标签:read   print   完成   closed   code   ide   shutdown   int   etc   

性能相关

在编写爬虫时,性能的消耗主要在IO请求中,当单进程单线程模式下请求URL时必然会引起等待,从而使得请求整体变慢。

技术分享图片
import requests

def fetch_async(url):
    response = requests.get(url)
    return response


url_list = [http://www.github.com, http://www.bing.com]

for url in url_list:
    fetch_async(url)
1.同步执行
技术分享图片
from concurrent.futures import ThreadPoolExecutor
import requests


def fetch_async(url):
    response = requests.get(url)
    return response


url_list = [http://www.github.com, http://www.bing.com]
pool = ThreadPoolExecutor(5)
for url in url_list:
    pool.submit(fetch_async, url)
pool.shutdown(wait=True)
2.多线程执行
技术分享图片
from concurrent.futures import ThreadPoolExecutor
import requests

def fetch_async(url):
    response = requests.get(url)
    return response


def callback(future):
    print(future.result())


url_list = [http://www.github.com, http://www.bing.com]
pool = ThreadPoolExecutor(5)
for url in url_list:
    v = pool.submit(fetch_async, url)
    v.add_done_callback(callback)
pool.shutdown(wait=True)
2.多线程+回调函数执行
技术分享图片
from concurrent.futures import ProcessPoolExecutor
import requests

def fetch_async(url):
    response = requests.get(url)
    return response


url_list = [http://www.github.com, http://www.bing.com]
pool = ProcessPoolExecutor(5)
for url in url_list:
    pool.submit(fetch_async, url)
pool.shutdown(wait=True)
3.多进程执行
技术分享图片
from concurrent.futures import ProcessPoolExecutor
import requests


def fetch_async(url):
    response = requests.get(url)
    return response


def callback(future):
    print(future.result())


url_list = [http://www.github.com, http://www.bing.com]
pool = ProcessPoolExecutor(5)
for url in url_list:
    v = pool.submit(fetch_async, url)
    v.add_done_callback(callback)
pool.shutdown(wait=True)
3.多进程+回调函数执行

通过上述代码均可以完成对请求性能的提高,对于多线程和多进行的缺点是在IO阻塞时会造成了线程和进程的浪费,所以异步IO回事首选:

 

爬虫之由性能说起

标签:read   print   完成   closed   code   ide   shutdown   int   etc   

原文地址:https://www.cnblogs.com/zero-white/p/8685120.html

(0)
(0)
   
举报
评论 一句话评论(0
登录后才能评论!
© 2014 mamicode.com 版权所有  联系我们:gaon5@hotmail.com
迷上了代码!