标签:benchmark com 中间 class ems check lines loader log
依赖关系
pip install wheel
pip install Twisted xxxxxxxx.whl
pip install pywin32
pip install scrapy
#创建project
scrapy startproject pro_name
cd pro_name
#创建爬虫
scrapy genspider chouti chouti.com
scrapy genspider cnblogs cnblogs.com
#启动爬虫
scrapy crawl chouti
1.创建project
scrapy startproject 项目名称
会自动生成几个文件
项目名称
项目名称/
- spiders(spiders folder /__init__.py) #存放爬虫文件
- items.py #持久化
- pipelines #持久化
- middlewares.py #中间件
- settings.py #配置文件 (爬虫)
scrapy.cfg #配置文件 (部署)
2.创建爬虫
cd 项目名称
scrapy genspider chouti chouti.com
scrapy genspider cnblogs cnblogs.com
3.启动爬虫
scrapy crawl chouti
命令行
Available commands:
bench Run quick benchmark test
check Check spider contracts
crawl Run a spider
edit Edit spider
fetch Fetch a URL using the Scrapy downloader
genspider Generate new spider using pre-defined templates
list List available spiders
parse Parse URL (using its spider) and print the results
runspider Run a self-contained spider (without creating a project)
settings Get settings values
shell Interactive scraping console
startproject Create new project
version Print Scrapy version
view Open URL in browser, as seen by Scrapy
Use "scrapy <command> -h" to see more info about a command
编码问题
import sys,os,io
sys.stdout=io.TextIOWrapper(sys.stdout.buffer,encoding=‘gb18030‘)
标签:benchmark com 中间 class ems check lines loader log
原文地址:https://www.cnblogs.com/yanxiatingyu/p/9715432.html