标签:
scrapy startproject xxx
import scrapy
from scrapy.contrib.spiders import CrawlSpider
from scrapy.http import Request
from scrapy.selector import Selector
xxx=selector.xpath(xxxxx).extract()
Project中包含:
Item objects are simple containers used to collect the scraped data. They provide a dictionary-like API with a convenient syntax for declaring their available fields.——Scrapy官方手册
items.py定义需要抓取并需要后期处理的数据
The Scrapy settings allows you to customize the behaviour of all Scrapy components, including the core, extensions, pipelines and spiders themselves.——Scrapy官方手册
settings.py文件配置Scrapy,从而修改user-agent,设定爬取时间间隔,设置代理,配置各种中间件等等
After an item has been scraped by a spider, it is sent to the Item Pipeline which process it through several components that are executed sequentially.——Scrapy官方手册
pipelines.py用于存放执行后期数据处理的功能,从而使得数据的爬取和处理分开。
标签:
原文地址:http://www.cnblogs.com/XBlack/p/5002748.html