码迷,mamicode.com
首页 > 其他好文 > 详细

Scrapy

时间:2018-09-27 22:00:22      阅读:129      评论:0      收藏:0      [点我收藏+]

标签:benchmark   com   中间   class   ems   check   lines   loader   log   

依赖关系
  pip install wheel
  pip install Twisted xxxxxxxx.whl
  pip install pywin32
  pip install scrapy


#创建project
scrapy startproject pro_name
cd pro_name

#创建爬虫
scrapy genspider chouti chouti.com
scrapy genspider cnblogs cnblogs.com


#启动爬虫
scrapy crawl chouti







1.创建project
  scrapy startproject 项目名称
    会自动生成几个文件
    项目名称
      项目名称/
        - spiders(spiders folder /__init__.py)     #存放爬虫文件
        - items.py       #持久化
        - pipelines      #持久化
        - middlewares.py   #中间件
        - settings.py     #配置文件 (爬虫)
      scrapy.cfg         #配置文件 (部署)

2.创建爬虫
  cd 项目名称
  scrapy genspider chouti chouti.com
  scrapy genspider cnblogs cnblogs.com

3.启动爬虫
  scrapy crawl chouti
  

  

命令行
  

Available commands:
    bench    Run quick benchmark test
    check    Check spider contracts
    crawl    Run a spider
    edit     Edit spider
    fetch    Fetch a URL using the Scrapy downloader
    genspider Generate new spider using pre-defined templates
    list     List available spiders
    parse    Parse URL (using its spider) and print the results
    runspider  Run a self-contained spider (without creating a project)
    settings   Get settings values
    shell    Interactive scraping console
    startproject Create new project
    version   Print Scrapy version
    view     Open URL in browser, as seen by Scrapy

Use "scrapy <command> -h" to see more info about a command




编码问题
import sys,os,io
sys.stdout=io.TextIOWrapper(sys.stdout.buffer,encoding=‘gb18030‘)









 

Scrapy

标签:benchmark   com   中间   class   ems   check   lines   loader   log   

原文地址:https://www.cnblogs.com/yanxiatingyu/p/9715432.html

(0)
(0)
   
举报
评论 一句话评论(0
登录后才能评论!
© 2014 mamicode.com 版权所有  联系我们:gaon5@hotmail.com
迷上了代码!