scrapy框架简易整理

时间：2018-11-26 00:13:47 阅读：185 评论：0 收藏：0 [点我收藏+]

标签：html item ... inux win path com pip3 linux

- scrapy框架
介绍：大而全的爬虫组件。

安装：
           - Win:
               下载：http://www.lfd.uci.edu/~gohlke/pythonlibs/#twisted

               pip3 install wheel
               pip install Twisted?18.4.0?cp36?cp36m?win_amd64.whl

               pip3 install pywin32

               pip3 install scrapy
           - Linux:
               pip3 install scrapy


使用：
           Django:
               # 创建project
               django-admin startproject mysite

               cd mysite

               # 创建app
               python manage.py startapp app01
               python manage.py startapp app02

               # 启动项目
               python manage.runserver

           Scrapy：
               # 创建project
               scrapy startproject xdb

               cd xdb

               # 创建爬虫
               scrapy genspider chouti chouti.com
               scrapy genspider cnblogs cnblogs.com

               # 启动爬虫
               scrapy crawl chouti




           1. 创建project
               scrapy startproject 项目名称

               项目名称
                   项目名称/
                       - spiders               # 爬虫文件
                           - chouti.py
                           - cnblgos.py
                           ....
                       - items.py                # 持久化
                       - pipelines               # 持久化
                       - middlewares.py       # 中间件
                       - settings.py            # 配置文件（爬虫）
                   scrapy.cfg                   # 配置文件（部署）

           2. 创建爬虫
               cd 项目名称

               scrapy genspider chouti chouti.com
               scrapy genspider cnblgos cnblgos.com

           3. 启动爬虫
               scrapy crawl chouti
               scrapy crawl chouti --nolog

总结：
           - HTML解析：xpath
           - 再次发起请求：yield Request对象

scrapy框架简易整理

标签：html item ... inux win path com pip3 linux

原文地址：https://www.cnblogs.com/l-jie-n/p/10017560.html

踩

(0)

评论一句话评论（0）

分享档案

更多>

2021年07月29日 (22)
2021年07月28日 (40)
2021年07月27日 (32)
2021年07月26日 (79)
2021年07月23日 (29)
2021年07月22日 (30)
2021年07月21日 (42)
2021年07月20日 (16)
2021年07月19日 (90)
2021年07月16日 (35)

周排行

scrapy框架 简易整理

scrapy框架简易整理