首页 > 其他好文 > 详细

关于Scrapy框架的基本概念

时间：2015-11-28 16:31:27 阅读：149 评论：0 收藏：0 [点我收藏+]

标签：

Scrapy爬取网页基本概念

Scrapy爬取网页基本概念

怎么样用Scrapy生成project？

scrapy startproject xxx

如何用Scrapy爬取网页？

import scrapy
from scrapy.contrib.spiders import CrawlSpider
from scrapy.http import Request
from scrapy.selector import Selector

xxx=selector.xpath(xxxxx).extract()

Scrapy的文件结构

Project中包含：

items.py
settings.py
pipelines.py

1. items.py

Item objects are simple containers used to collect the scraped data. They provide a dictionary-like API with a convenient syntax for declaring their available fields.——Scrapy官方手册

items.py定义需要抓取并需要后期处理的数据

2. settings.py

The Scrapy settings allows you to customize the behaviour of all Scrapy components, including the core, extensions, pipelines and spiders themselves.——Scrapy官方手册

settings.py文件配置Scrapy，从而修改user-agent，设定爬取时间间隔，设置代理，配置各种中间件等等

3. pipelines.py

After an item has been scraped by a spider, it is sent to the Item Pipeline which process it through several components that are executed sequentially.——Scrapy官方手册

pipelines.py用于存放执行后期数据处理的功能，从而使得数据的爬取和处理分开。

关于Scrapy框架的基本概念

标签：

原文地址：http://www.cnblogs.com/XBlack/p/5002748.html

踩

(0)

赞

(0)

举报

评论一句话评论（0）

分享档案

更多>

2021年07月29日 (22)
2021年07月28日 (40)
2021年07月27日 (32)
2021年07月26日 (79)
2021年07月23日 (29)
2021年07月22日 (30)
2021年07月21日 (42)
2021年07月20日 (16)
2021年07月19日 (90)
2021年07月16日 (35)

周排行

更多

友情链接

兰亭集智国之画百度统计站长统计阿里云 chrome插件新版天听网

关于我们 - 联系我们 - 留言反馈

© 2014 mamicode.com 版权所有联系我们:gaon5@hotmail.com

迷上了代码！