码迷,mamicode.com
首页 > 编程语言 > 详细

多线程爬虫

时间:2017-06-25 23:08:52      阅读:240      评论:0      收藏:0      [点我收藏+]

标签:tip   page   process   tieba   gets   列表   网址   class   end   

 1 # 多线程爬虫
 2 # map函数的使用
 3 # from multiprocessing.dummy import Pool
 4 # pool=Pool(4)
 5 # results = pool.map(爬取函数,网址列表)
 6 # 实例演示:
 7 from multiprocessing.dummy import Pool as ThreadPool
 8 import requests
 9 import time
10 
11 def getsource(url):
12     html = requests.get(url)
13 
14 urls = []
15 
16 for i in range(1,21):
17     newpage = http://tieba.baidu.com/p/3522395718?pn= + str(i)
18     urls.append(newpage)
19 
20 time1 = time.time()
21 
22 for i in urls:
23     print(i)
24     getsource(i)
25 time2= time.time()
26 print(单线程耗时: + str(time2-time1))
27 
28 # 启用多线程
29 pool = ThreadPool(4)
30 time3=time.time()
31 results = pool.map(getsource,urls)
32 pool.close()
33 pool.join()
34 time4 = time.time()
35 print(并行耗时: + str(time4-time3))
36 
37 # 输出结果:
38 # 单线程耗时:20.18715476989746
39 # 并行耗时:5.100291728973389

 

多线程爬虫

标签:tip   page   process   tieba   gets   列表   网址   class   end   

原文地址:http://www.cnblogs.com/themost/p/7078320.html

(0)
(0)
   
举报
评论 一句话评论(0
登录后才能评论!
© 2014 mamicode.com 版权所有  联系我们:gaon5@hotmail.com
迷上了代码!