关于在使用scrapy-redis分布式踩过的那些坑：

时间：2017-06-22 18:37:33 阅读：2506 评论：0 收藏：0 [点我收藏+]

标签：top nbsp 终端进程取数安装 image 文件 man

自己的案列：win7上安装ubuntu (win7作为slaver,ubuntu作为master )

修改配置文件redis.conf

1)打开配置文件把下面对应的注释掉

# bind 127.0.0.1

技术分享

2)Redis默认不是以守护进程的方式运行，可以通过该配置项修改，设置为no

daemonize no

技术分享

3)保护模式

protected-mode no

技术分享

关键的一步：ubuntu终端命令中重启redis服务的时候如下操作：

redis-server redis.conf

在win7上安装RedisDesktopManage查看ubuntur Redis数据库（连接方法：完成以上1，2，3即可连接注意：ubuntu的网络适配器要选择桥接模式。）

技术分享

在编写爬虫的时候：

发现这样写域名的范围会报错（在ubuntu中push url后爬虫没有爬取数据）：

#动态域范围的获取
def __init__(self, *args, **kwargs):
    # Dynamically define the allowed domains list.
    domain = kwargs.pop(‘domain‘, ‘‘)
    self.allowed_domains = filter(None, domain.split(‘,‘))
    super(MySpider, self).__init__(*args, **kwargs)

而这样写不会报错：

allowed_domains = ["xxx.com"]

在爬虫的settings.py中指明主机地址以及端口号

如：

REDIS_HOST = ‘x.x.x.x‘ 主机地址(ubuntu IP地址）
REDIS_PORT = 6379

关于在使用scrapy-redis分布式踩过的那些坑：

标签：top nbsp 终端进程取数安装 image 文件 man

原文地址：http://www.cnblogs.com/huwei934/p/7066223.html

踩

(0)

评论一句话评论（0）

分享档案

更多>

2021年07月29日 (22)
2021年07月28日 (40)
2021年07月27日 (32)
2021年07月26日 (79)
2021年07月23日 (29)
2021年07月22日 (30)
2021年07月21日 (42)
2021年07月20日 (16)
2021年07月19日 (90)
2021年07月16日 (35)

周排行