标签:port 更换 microsoft dom raw muse class ffffff scrapy
https://github.com/hellysmile/fake-useragent
DOWNLOADER_MIDDLEWARES增加自定义
from fake_useragent import UserAgent
class RandomUserAgentMiddlware(object):
# 随机更换user-agent
def __init__(self, crawler):
super(RandomUserAgentMiddlware, self).__init__()
self.ua = UserAgent()
self.ua_type = crawler.settings.get("RANDOM_UA_TYPE", "random")
@classmethod
def from_crawler(cls, crawler):
return cls(crawler)
def process_request(self, request, spider):
def get_ua():
return getattr(self.ua, self.ua_type)
request.headers.setdefault(‘User-Agent‘, get_ua())
备注:settings.py增加配置项
RANDOM_UA_TYPE = "random"
1、通过免费的代理IP,如西刺,自己获取IP源进行使用
2、免费插件scrapy_proxies
https://github.com/aivarsk/scrapy-proxies
3、收费插件scrapy-crawlera
https://github.com/scrapy-plugins/scrapy-crawlera
1、编码实现(tesseract-ocr)
2、在线打码
3、人工打码
RANDOM_UA_TYPE = "random"
标签:port 更换 microsoft dom raw muse class ffffff scrapy
原文地址:http://www.cnblogs.com/shhnwangjian/p/7339316.html