首页 > 其他好文 > 详细

scrapy使用

时间：2019-12-26 11:37:41 阅读：81 评论：0 收藏：0 [点我收藏+]

标签：app org raw false webkit tar spider 存储 yourself

制作 Scrapy 爬虫一共需要4步：

新建项目 (scrapy startproject xxx)：新建一个新的爬虫项目
明确目标（编写items.py）：明确你想要抓取的目标
制作爬虫（spiders/xxspider.py）：制作爬虫开始爬取网页
1. 在spiders目录下，生成爬虫
2. 编辑这个爬虫文件
3. 记得修改settings.py文件
  1. # Crawl responsibly by identifying yourself (and your website) on the user-agent
    USER_AGENT = ‘Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/63.0.3239.26 Safari/537.36 Core/1.63.5514.400 QQBrowser/10.1.1660.400‘
    
    2
    # Obey robots.txt rules
    #不遵守机器人规则有些它不让去的我们也要去
    ROBOTSTXT_OBEY = False
    
    3
    # See https://docs.scrapy.org/en/latest/topics/item-pipeline.html
    
    ITEM_PIPELINES = {
    ‘cls.pipelines.ClsPipeline‘: 300,
    }
存储内容（pipelines.py）：设计管道存储爬取内容
1. 往mongodb数据库存

标签：app org raw false webkit tar spider 存储 yourself

原文地址：https://www.cnblogs.com/zhibin123/p/12100922.html

踩

(0)

赞

(0)

举报

评论一句话评论（0）

分享档案

更多>

2021年07月29日 (22)
2021年07月28日 (40)
2021年07月27日 (32)
2021年07月26日 (79)
2021年07月23日 (29)
2021年07月22日 (30)
2021年07月21日 (42)
2021年07月20日 (16)
2021年07月19日 (90)
2021年07月16日 (35)

周排行

更多

友情链接

兰亭集智国之画百度统计站长统计阿里云 chrome插件新版天听网

关于我们 - 联系我们 - 留言反馈

© 2014 mamicode.com 版权所有联系我们:gaon5@hotmail.com

迷上了代码！