码迷,mamicode.com
首页 > 其他好文 > 详细

UA池和代理池(IP)

时间:2019-01-02 15:49:11      阅读:1747      评论:0      收藏:0      [点我收藏+]

标签:lse   color   url   webkit   请求   ESS   list   ref   handle   

UA池(每一次请求采用池中的随机UA)

a) 在中间件类中进行导包

from scrapy.contrib.downloadermiddleware.useragent import UserAgentMiddleware 

b)封装一个基于UserAgentMiddleware的类,且重写该类

  例:

  middleware.py

from scrapy.contrib.downloadermiddleware.useragent import UserAgentMiddleware
import random

ua_list = [Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10_6_8; en-us) AppleWebKit/534.50 (KHTML, like Gecko) Version/5.1 Safari/534.50,
           User-Agent:Mozilla/5.0 (Windows; U; Windows NT 6.1; en-us) AppleWebKit/534.50 (KHTML, like Gecko) Version/5.1 Safari/534.50,
           User-Agent:Mozilla/5.0 (compatible; MSIE 9.0; Windows NT 6.1; Trident/5.0;,
           User-Agent:Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 6.0; Trident/4.0),
           User-Agent:Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 6.0),
           User-Agent: Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1),
           User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.6; rv:2.0.1) Gecko/20100101 Firefox/4.0.1,
           User-Agent:Mozilla/5.0 (Windows NT 6.1; rv:2.0.1) Gecko/20100101 Firefox/4.0.1,
           User-Agent:Opera/9.80 (Macintosh; Intel Mac OS X 10.6.8; U; en) Presto/2.8.131 Version/11.11]
ip_http_list = [90.229.216.218:46796, 110.235.250.7:49341, 81.163.62.136:41258, 195.34.207.47:60878]
ip_https_list = [140.227.207.211:60088, 140.227.209.210:60088, 185.132.133.102:1080]


class UserAgentRandom(UserAgentMiddleware):
    def process_request(self, request, spider):
        ua = random.choice(ua_list)
        request.headers.setdefault(User-Agent, ua)

 

settings.py

DOWNLOADER_MIDDLEWARES = {
   handle5.middlewares.Handle5DownloaderMiddleware: 543,
   handle5.middlewares.UserAgentRandom: 542,
   handle5.middlewares.IpRandom: 541
}

 

 

代理池(IP 每次请求的IP地址随机从IP池中获取)

middleware.py

class IpRandom:
    def process_request(self, request, spider):
        url = request.url
        head = url.split(":")[0]
        if head == "http":
            request.meta["proxy"] = "http://" + random.choice(ip_http_list)
        else:
            request.meta["proxy"] = "https://" + random.choice(ip_https_list)

 

UA池和代理池(IP)

标签:lse   color   url   webkit   请求   ESS   list   ref   handle   

原文地址:https://www.cnblogs.com/cjj-zyj/p/10208770.html

(0)
(0)
   
举报
评论 一句话评论(0
登录后才能评论!
© 2014 mamicode.com 版权所有  联系我们:gaon5@hotmail.com
迷上了代码!