码迷,mamicode.com
首页 > 其他好文 > 详细

scrapy-redis 自定义去重规则

时间:2019-03-29 19:03:01      阅读:314      评论:0      收藏:0      [点我收藏+]

标签:编码   imp   cti   规则   defaults   encode   @class   enc   定义   

############### xxx.py  ######

from
scrapy_redis.dupefilter import RFPDupeFilter from scrapy_redis.connection import get_redis_from_settings from scrapy_redis import defaults class RdisDupeFilter(RFPDupeFilter): @classmethod def from_settings(cls, settings): server = get_redis_from_settings(settings) key = defaults.DUPEFILTER_KEY % {timestamp:myScrapy} debug = settings.getbool(DUPEFILTER_DEBUG) return cls(server, key=key, debug=debug)

到settings.py中配置

# ######################### scrapy redis连接 ##############
REDIS_HOST = "129.28.96.43"  #主机名
REDIS_PORT = 6379            #端口
REDIS_PARAMS = {password:"beta"}
REDIS_ENCODEING = "utf-8"    #redis编码类型

# REDIS_URL = ‘redis://user:pwd@hostname:9001‘ #连接URL 优先上面配置

DUPEFILTER_KEY = dupefilter:%(timestamp)s

# DUPEFLITER_CLASS = ‘scrapy_redis.dupefilter.RFPDupeFilter‘
DUPEFLITER_CLASS = myscrapy.xxx.RedisDupeFilter

 

scrapy-redis 自定义去重规则

标签:编码   imp   cti   规则   defaults   encode   @class   enc   定义   

原文地址:https://www.cnblogs.com/erhao9767/p/10623210.html

(0)
(0)
   
举报
评论 一句话评论(0
登录后才能评论!
© 2014 mamicode.com 版权所有  联系我们:gaon5@hotmail.com
迷上了代码!