标签:rip attr read vat pts 目录 sel osc rtu
安装 Python,版本选择 Python 3,原因看这里:https://wiki.python.org/moin/Python2orPython3
创建 virtual environment(venv)
# 在当前目录创建虚拟环境
python -m venv .
# 激活虚拟环境
.\Scripts\Activate.ps1
安装 pip
# 升级 pip 版本
# -i 用来指定 pipy 源
python -m pip install --upgrade pip -i https://pypi.tuna.tsinghua.edu.cn/simple
安装 Scrapy
pip install Scrapy -i https://pypi.tuna.tsinghua.edu.cn/simple
写脚本 quotes_spider.py
import scrapy
class QuotesSpider(scrapy.Spider):
name = 'quotes'
start_urls = [
'http://quotes.toscrape.com/tag/humor/',
]
def parse(self, response):
for quote in response.css('div.quote'):
yield {
'author': quote.xpath('span/small/text()').get(),
'text': quote.css('span.text::text').get(),
}
next_page = response.css('li.next a::attr("href")').get()
if next_page is not None:
yield response.follow(next_page, self.parse)
执行脚本
scrapy runspider quotes_spider.py -o quotes.json
标签:rip attr read vat pts 目录 sel osc rtu
原文地址:https://www.cnblogs.com/nehcdahc/p/12527121.html