标签:参数 开启 unifi -o 一个 rac desc selector def
1.Windows7x64_SP1
2.anaconda3 + python3.7.3(anaconda集成,不需单独安装)
3.scrapy1.6.0
scrapy shell http://doc.scrapy.org/en/latest/_static/selectors-sample1.html
结果如下:
result = response.xpath(‘//a‘)
结果如下:
[<Selector xpath=‘//a‘ data=‘<a href="image1.html">Name: My image 1 <‘>, <Selector xpath=‘//a‘ data=‘<a href="image2.html">Name: My image 2 <‘>, <Selector xpath=‘//a‘ data=‘<a href="image3.html">Name: My image 3 <‘>, <Selector xpath=‘//a‘ data=‘<a href="image4.html">Name: My image 4 <‘>, <Selector xpath=‘//a‘ data=‘<a href="image5.html">Name: My image 5 <‘>]
result = response.css(‘a‘)
结果如下:
[<Selector xpath=‘descendant-or-self::a‘ data=‘<a href="image1.html">Name: My image 1 <‘>, <Selector xpath=‘descendant-or-self::a‘ data=‘<a href="image2.html">Name: My image 2 <‘>, <Selector xpath=‘descendant-or-self::a‘ data=‘<a href="image3.html">Name: My image 3 <‘>, <Selector xpath=‘descendant-or-self::a‘ data=‘<a href="image4.html">Name: My image 4 <‘>, <Selector xpath=‘descendant-or-self::a‘ data=‘<a href="image5.html">Name: My image 5 <‘>]
type(result)
结果如下:
scrapy.selector.unified.SelectorList
result.extract()
结果如下:
[‘<a href="image1.html">Name: My image 1 <br><img src="image1_thumb.jpg"></a>‘, ‘<a href="image2.html">Name: My image 2 <br><img src="image2_thumb.jpg"></a>‘, ‘<a href="image3.html">Name: My image 3 <br><img src="image3_thumb.jpg"></a>‘, ‘<a href="image4.html">Name: My image 4 <br><img src="image4_thumb.jpg"></a>‘, ‘<a href="image5.html">Name: My image 5 <br><img src="image5_thumb.jpg"></a>‘]
response.xpath(‘//a/text()‘)
结果如下:
[<Selector xpath=‘//a/text()‘ data=‘Name: My image 1 ‘>, <Selector xpath=‘//a/text()‘ data=‘Name: My image 2 ‘>, <Selector xpath=‘//a/text()‘ data=‘Name: My image 3 ‘>, <Selector xpath=‘//a/text()‘ data=‘Name: My image 4 ‘>, <Selector xpath=‘//a/text()‘ data=‘Name: My image 5 ‘>]
查看HTML内容
response.xpath(‘//a/text()‘).extract()
结果如下:
[‘Name: My image 1 ‘, ‘Name: My image 2 ‘, ‘Name: My image 3 ‘, ‘Name: My image 4 ‘, ‘Name: My image 5 ‘]
response.css(‘a::text‘).extract()
结果如下:
[‘Name: My image 1 ‘, ‘Name: My image 2 ‘, ‘Name: My image 3 ‘, ‘Name: My image 4 ‘, ‘Name: My image 5 ‘]
response.xpath(‘//a/@href‘).extract()
结果如下:
[‘image1.html‘, ‘image2.html‘, ‘image3.html‘, ‘image4.html‘, ‘image5.html‘]
response.css(‘a::attr("href")‘).extract()
结果如下:
[‘image1.html‘, ‘image2.html‘, ‘image3.html‘, ‘image4.html‘, ‘image5.html‘]
response.xpath(‘//a/img‘).extract()
结果如下:
[‘<img src="image1_thumb.jpg">‘, ‘<img src="image2_thumb.jpg">‘, ‘<img src="image3_thumb.jpg">‘, ‘<img src="image4_thumb.jpg">‘, ‘<img src="image5_thumb.jpg">‘]
response.css(‘a img‘).extract()
结果如下:
[‘<img src="image1_thumb.jpg">‘, ‘<img src="image2_thumb.jpg">‘, ‘<img src="image3_thumb.jpg">‘, ‘<img src="image4_thumb.jpg">‘, ‘<img src="image5_thumb.jpg">‘]
再提取其中的src属性值,与步骤6相同
response.xpath(‘//a/img/@src‘).extract()
response.css(‘a img::attr("src")‘).extract()
标签:参数 开启 unifi -o 一个 rac desc selector def
原文地址:https://www.cnblogs.com/hester/p/11371384.html