Scrapy-Xpath Function

时间：2019-02-18 21:40:42 阅读：139 评论：0 收藏：0 [点我收藏+]

标签：nbsp cli ons spec get sele str test selector

Refer to :https://doc.scrapy.org/en/latest/topics/selectors.html#topics-selectors

>>> response.xpath("//a/@href").getall()
[‘image1.html‘, ‘image2.html‘, ‘image3.html‘, ‘image4.html‘, ‘image5.html‘]

****************************************************************************************

>>> response.xpath(‘//a[contains(@href, "image")]/text()‘).re(r‘Name:\s*(.*)‘)
[‘My image 1‘,
 ‘My image 2‘,
 ‘My image 3‘,
 ‘My image 4‘,
 ‘My image 5‘]

****************************************************************************************

>>> response.xpath(‘//a[contains(@href, "image")]/text()‘).re_first(r‘Name:\s*(.*)‘)
‘My image 1‘
****************************************************************************************
Get and extract_first

>>> response.css(‘a::attr(href)‘).get()
‘image1.html‘
>>> response.css(‘a::attr(href)‘).extract_first()
‘image1.html‘

****************************************************************************************

>>> response.css(‘a::attr(href)‘).getall()
[‘image1.html‘, ‘image2.html‘, ‘image3.html‘, ‘image4.html‘, ‘image5.html‘]
>>> response.css(‘a::attr(href)‘).extract()
[‘image1.html‘, ‘image2.html‘, ‘image3.html‘, ‘image4.html‘, ‘image5.html‘]

****************************************************************************************

>>> response.css(‘a::attr(href)‘)[0].get()
‘image1.html‘
>>> response.css(‘a::attr(href)‘)[0].extract()
‘image1.html‘

****************************************************************************************
CSS 模糊匹配class

>>> from scrapy import Selector
>>> sel = Selector(text=‘<div class="hero shout"><time datetime="2014-07-23 19:00">Special date</time></div>‘)
>>> sel.css(‘.shout‘).xpath(‘./time/@datetime‘).getall()
[‘2014-07-23 19:00‘]

****************************************************************************************

>>> from scrapy import Selector
>>> sel = Selector(text="""
....:     <ul class="list">
....:         <li>1</li>
....:         <li>2</li>
....:         <li>3</li>
....:     </ul>
....:     <ul class="list">
....:         <li>4</li>
....:         <li>5</li>
....:         <li>6</li>
....:     </ul>""")
>>> xp = lambda x: sel.xpath(x).getall()

>>> xp("//li[1]")
[‘<li>1</li>‘, ‘<li>4</li>‘]

>>> xp("(//li)[1]")
[‘<li>1</li>‘]

****************************************************************************************

[‘<a href="#">Click here to go to the <strong>Next Page</strong></a>‘]
>>> sel.xpath("string(//a[1])").getall() # convert it to string
[‘Click here to go to the Next Page‘]

>>> sel.xpath("//a[contains(.//text(), ‘Next Page‘)]").getall()
[]

>>> sel.xpath("//a[contains(., ‘Next Page‘)]").getall()
[‘<a href="#">Click here to go to the <strong>Next Page</strong></a>‘]

****************************************************************************************

Scrapy-Xpath Function

标签：nbsp cli ons spec get sele str test selector

原文地址：https://www.cnblogs.com/jwr810/p/10398005.html

踩

(0)

评论一句话评论（0）

分享档案

更多>

2021年07月29日 (22)
2021年07月28日 (40)
2021年07月27日 (32)
2021年07月26日 (79)
2021年07月23日 (29)
2021年07月22日 (30)
2021年07月21日 (42)
2021年07月20日 (16)
2021年07月19日 (90)
2021年07月16日 (35)

周排行