crawlSpider

时间：2019-10-29 09:17:06 阅读：60 评论：0 收藏：0 [点我收藏+]

标签：nbsp llb parse actor tor scrapy 子类提取正则表达

crawlSpider 作用于网站上有下一页和上一页的标签的内容，然后规则相同的网站进行爬取的效果

　　启动命令：

　　　　1、创建项目文件

　　　　　　scrapy startproject 项目名称

　　　　2、cd 项目文件

　　　　3、创建爬虫

　　　　　　scrapy genspider -t crawl 爬虫名称域名

　　语句作用：

　　　　1、link = LinkExtractor(allow=r‘Items/‘)

　　　 link是链接提取器，根据一定规则提取某类链接

　　　 allow：表示的就是链接提取器提取链接的规则（这里面的规则是正则表达式）

　　　 2、

　　　　rules = (
    　　　　　　Rule(link, callback=‘parse_item‘, follow=False),
　　　　　　　　)
　　 Rule是规则解析器，将链接提取器提取到的链接对应的页面数据进行指定形式的解析
　　 follow 是让链接提取器 继续作用到链接提取器提取到的链接所对应的页面中
　　　　　　参数：
　　　　　　　　False 不作用
　　　　　　　　True  作用
　　callback 回调函数，里面写link携带的内容的解析

　其余内容：
　　与scrapy内容一致，因为crawlSpider就是Spider的一个子类，所以很多功能都是与Spider是一样的，所以大家熟练使用Spider，上手crawlSpider就很容易了

crawlSpider

标签：nbsp llb parse actor tor scrapy 子类提取正则表达

原文地址：https://www.cnblogs.com/ifiwant/p/11756727.html

踩

(0)

评论一句话评论（0）

分享档案

更多>

2021年07月29日 (22)
2021年07月28日 (40)
2021年07月27日 (32)
2021年07月26日 (79)
2021年07月23日 (29)
2021年07月22日 (30)
2021年07月21日 (42)
2021年07月20日 (16)
2021年07月19日 (90)
2021年07月16日 (35)

周排行