码迷,mamicode.com
首页 > 其他好文 > 详细

Scrapy框架中的 UA伪装

时间:2018-12-18 19:52:27      阅读:195      评论:0      收藏:0      [点我收藏+]

标签:x11   user   lap   into   text   useragent   os x   head   pid   

例如:百度输入ip查看是自己本机的ip,通过UA伪装成其他机器的ip,

爬虫代码:

技术分享图片
 1 import scrapy
 2 
 3 
 4 class UatestSpider(scrapy.Spider):
 5     name = UATest
 6     # allowed_domains = [www.xxx.com]
 7     start_urls = [https://www.baidu.com/s?wd=ip]
 8     def parse(self, response):
 9         with open(./ip.html,w,encoding=utf-8)as fp:
10             fp.write(response.text)
11             print(over!!!)
爬虫代码

Middlewares中间件代码:

技术分享图片
 1 from scrapy import signals
 2 from scrapy.contrib.downloadermiddleware.useragent import UserAgentMiddleware
 3 import  random
 4 user_agent_list = [
 5         "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.1 "
 6         "(KHTML, like Gecko) Chrome/22.0.1207.1 Safari/537.1",
 7         "Mozilla/5.0 (X11; CrOS i686 2268.111.0) AppleWebKit/536.11 "
 8         "(KHTML, like Gecko) Chrome/20.0.1132.57 Safari/536.11",
 9         "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/536.6 "
10         "(KHTML, like Gecko) Chrome/20.0.1092.0 Safari/536.6",
11         "Mozilla/5.0 (Windows NT 6.2) AppleWebKit/536.6 "
12         "(KHTML, like Gecko) Chrome/20.0.1090.0 Safari/536.6",
13         "Mozilla/5.0 (Windows NT 6.2; WOW64) AppleWebKit/537.1 "
14         "(KHTML, like Gecko) Chrome/19.77.34.5 Safari/537.1",
15         "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/536.5 "
16         "(KHTML, like Gecko) Chrome/19.0.1084.9 Safari/536.5",
17         "Mozilla/5.0 (Windows NT 6.0) AppleWebKit/536.5 "
18         "(KHTML, like Gecko) Chrome/19.0.1084.36 Safari/536.5",
19         "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/536.3 "
20         "(KHTML, like Gecko) Chrome/19.0.1063.0 Safari/536.3",
21         "Mozilla/5.0 (Windows NT 5.1) AppleWebKit/536.3 "
22         "(KHTML, like Gecko) Chrome/19.0.1063.0 Safari/536.3",
23         "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_8_0) AppleWebKit/536.3 "
24         "(KHTML, like Gecko) Chrome/19.0.1063.0 Safari/536.3",
25         "Mozilla/5.0 (Windows NT 6.2) AppleWebKit/536.3 "
26         "(KHTML, like Gecko) Chrome/19.0.1062.0 Safari/536.3",
27         "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/536.3 "
28         "(KHTML, like Gecko) Chrome/19.0.1062.0 Safari/536.3",
29         "Mozilla/5.0 (Windows NT 6.2) AppleWebKit/536.3 "
30         "(KHTML, like Gecko) Chrome/19.0.1061.1 Safari/536.3",
31         "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/536.3 "
32         "(KHTML, like Gecko) Chrome/19.0.1061.1 Safari/536.3",
33         "Mozilla/5.0 (Windows NT 6.1) AppleWebKit/536.3 "
34         "(KHTML, like Gecko) Chrome/19.0.1061.1 Safari/536.3",
35         "Mozilla/5.0 (Windows NT 6.2) AppleWebKit/536.3 "
36         "(KHTML, like Gecko) Chrome/19.0.1061.0 Safari/536.3",
37         "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/535.24 "
38         "(KHTML, like Gecko) Chrome/19.0.1055.1 Safari/535.24",
39         "Mozilla/5.0 (Windows NT 6.2; WOW64) AppleWebKit/535.24 "
40         "(KHTML, like Gecko) Chrome/19.0.1055.1 Safari/535.24"
41 ]
42 
43 class UAPool(UserAgentMiddleware):
44     def process_request(self,request,spider):
45         ua=random.choice(user_agent_list)
46         request.headers[User-Agent]=ua
47         print(request.headers[User-Agent])
48 
49 proxy_http = [125.27.10.150:56292,114.34.168.157:46160]
50 proxy_https = [1.20.101.81:35454,113.78.254.156:9000]
51 class UapoolDownloaderMiddleware(object):
52     #request参数就是拦截到的 请求对象
53     #spider就是爬虫对象
54     def process_request(self, request, spider):
55         if request.url.split(:)[0]==https:
56             request.meta[proxy]=https://+random.choice(proxy_https)
57         else:
58             request.meta[proxy] = http:// + random.choice(proxy_http)
59         print(request.meta[proxy])
60         return None
middlewares

注:setting需要解开中间件,并添加自己写的中间件类

Scrapy框架中的 UA伪装

标签:x11   user   lap   into   text   useragent   os x   head   pid   

原文地址:https://www.cnblogs.com/duanhaoxin/p/10138809.html

(0)
(0)
   
举报
评论 一句话评论(0
登录后才能评论!
© 2014 mamicode.com 版权所有  联系我们:gaon5@hotmail.com
迷上了代码!