码迷,mamicode.com
首页 > 其他好文 > 详细

scrapy_redis 设置

时间:2019-06-08 15:07:32      阅读:105      评论:0      收藏:0      [点我收藏+]

标签:name   link   The   round   rgs   from   filter   fir   mic   

class MyCrawler(RedisCrawlSpider):
"""Spider that reads urls from redis queue (myspider:start_urls)."""
name = ‘mycrawler_redis‘
redis_key = ‘mycrawler:start_urls‘

rules = (
# follow all links
Rule(LinkExtractor(), callback=‘parse_page‘, follow=True),
)

def __init__(self, *args, **kwargs):
# Dynamically define the allowed domains list.
domain = kwargs.pop(‘domain‘, ‘‘)
self.allowed_domains = filter(None, domain.split(‘,‘))
super(MyCrawler, self).__init__(*args, **kwargs)

def parse_page(self, response):
return {
‘name‘: response.css(‘title::text‘).extract_first(),
‘url‘: response.url,
}

scrapy_redis 设置

标签:name   link   The   round   rgs   from   filter   fir   mic   

原文地址:https://www.cnblogs.com/wangdongpython/p/10990629.html

(0)
(0)
   
举报
评论 一句话评论(0
登录后才能评论!
© 2014 mamicode.com 版权所有  联系我们:gaon5@hotmail.com
迷上了代码!