web crawling(plus7) scrapy1 commands)

时间：2017-10-07 14:49:54 阅读：177 评论：0 收藏：0 [点我收藏+]

标签：content strong baidu containe fine option browser disable pid

Available commands:
bench Run quick benchmark test
fetch Fetch a URL using the Scrapy downloader
genspider Generate new spider using pre-defined templates
runspider Run a self-contained spider (without creating a project)
settings Get settings values
shell Interactive scraping console
startproject Create new project
version Print Scrapy version
view Open URL in browser, as seen by Scrapy

scrapy fetch [options] <url>

Fetch a URL using the Scrapy downloader and print its content to stdout. You
may want to use --nolog to disable logging

Options
=======
--help, -h show this help message and exit
--spider=SPIDER use this spider
--headers print response HTTP headers instead of body
--no-redirect do not handle HTTP 3xx status codes and print response
as-is

Global Options
--------------
--logfile=FILE log file. if omitted stderr will be used
--loglevel=LEVEL, -L LEVEL
log level (default: DEBUG)
--nolog disable logging completely
--profile=FILE write python cProfile stats to FILE
--pidfile=FILE write process ID to FILE
--set=NAME=VALUE, -s NAME=VALUE
set/override setting (may be repeated)
--pdb enable pdb on failure

runspider

scrapy shell url --nolog

In[1]:

scrapy startproject project_name

scrapy version

scrapy view(download a website & view it with browser)

eg: scrapy view url

project commands:

Available commands:
bench Run quick benchmark test
check Check spider contracts
crawl Run a spider
edit Edit spider
fetch Fetch a URL using the Scrapy downloader
genspider Generate new spider using pre-defined templates
list List available spiders
parse Parse URL (using its spider) and print the results
runspider Run a self-contained spider (without creating a project)
settings Get settings values
shell Interactive scraping console
startproject Create new project
version Print Scrapy version
view Open URL in browser, as seen by Scrapy

E:\m\f1>scrapy genspider -l
Available templates:
basic
crawl
csvfeed
xmlfeed

E:\m\f1>scrapy genspider -t basic spider baidu.com
Created spider ‘spider‘ using template ‘basic‘ in module:
f1.spiders.spider

E:\m\f1>scrapy check spider

----------------------------------------------------------------------
Ran 0 contracts in 0.000s

E:\m\f1>scrapy crawl spider

E:\m\f1>scrapy list
spider

E:\m\f1>scrapy edit spider(linux is fine)

E:\m\f1>scrapy parse www.baidu.com

web crawling(plus7) scrapy1 commands)

标签：content strong baidu containe fine option browser disable pid

原文地址：http://www.cnblogs.com/rabbittail/p/7633241.html

踩

(0)

评论一句话评论（0）

分享档案

更多>

2021年07月29日 (22)
2021年07月28日 (40)
2021年07月27日 (32)
2021年07月26日 (79)
2021年07月23日 (29)
2021年07月22日 (30)
2021年07月21日 (42)
2021年07月20日 (16)
2021年07月19日 (90)
2021年07月16日 (35)

周排行