码迷,mamicode.com
首页 > Web开发 > 详细

第七篇 css选择器实现字段解析

时间:2017-10-02 10:10:09      阅读:289      评论:0      收藏:0      [点我收藏+]

标签:...   空格   obj   mod   标签   code   article   put   inpu   

CSS选择器的作用实际和xpath的一样,都是为了定位具体的元素 

技术分享

 

技术分享

技术分享

 

举例我要爬取下面这个页面的标题

技术分享

In [20]: title = response.css(".entry-header h1")

In [21]: title
Out[21]: [<Selector xpath="descendant-or-self::*[@class and contains(concat(‘ ‘, normalize-space(@class), ‘ ‘), ‘ entry-header ‘)]/descendant-or-self::*/h1" data=<h1>谷歌用两年时间研究了 180 个团队,发现高效团队有这五个特征</h1>>]

In [22]: title = response.css(".entry-header h1").extract()

In [23]: title
Out[23]: [<h1>谷歌用两年时间研究了 180 个团队,发现高效团队有这五个特征</h1>]

In [24]: ##可以使用css的::text取到内容

In [25]: title = response.css(".entry-header h1::text").extract()

In [26]: title
Out[26]: [谷歌用两年时间研究了 180 个团队,发现高效团队有这五个特征]

获取文章创建日期:

In [38]: date_text = response.css(".entry-meta-hide-on-mobile").extract()

In [39]: date_text
Out[39]: [<p class="entry-meta-hide-on-mobile">\r\n\r\n            2017/08/23 ·  <a href="http://blog.jobbole.com/category/career/" rel="category tag">职场</a>\r\n            \r\n                            · <a href="#article-comment"> 7 评论 </a>\r\n            \r\n\r\n            \r\n             ·  <a href="http://blog.jobbole.com/tag/google/">Google</a>, <a href="http://blog.jobbole.com/tag/%e5%9b%a2%e9%98%9f/">团队</a>\r\n            \r\n</p>]

In [40]: date_text = response.css(".entry-meta-hide-on-mobile::text").extract()

In [41]: date_text
Out[41]: 
[\r\n\r\n            2017/08/23 ·  ,
 \r\n            \r\n                            · ,
 \r\n            \r\n\r\n            \r\n             ·  ,
 , ,
 \r\n            \r\n]

In [42]: date_text = response.css(".entry-meta-hide-on-mobile::text").extract()[
    ...: 0]

In [43]: date_text
Out[43]: \r\n\r\n            2017/08/23 ·  

In [44]: date_text = response.css(".entry-meta-hide-on-mobile::text").extract()[
    ...: 0].strip()

In [45]: date_text
Out[45]: 2017/08/23 ·

In [46]: date_text = response.css(".entry-meta-hide-on-mobile::text").extract()[
    ...: 0].strip().replace("·","").strip()

In [47]: date_text
Out[47]: 2017/08/23

 获取评论数

技术分享

 

 

技术分享
In [49]: comment_num = response.css("a[href=‘#article-comment‘]")

In [50]: comment_num
Out[50]: 
[<Selector xpath="descendant-or-self::a[@href = ‘#article-comment‘]" data=<a href="#article-comment"> 7 评论 </a>>,
 <Selector xpath="descendant-or-self::a[@href = ‘#article-comment‘]" data=<a href="#article-comment"><span class=">]

In [51]: comment_num = response.css("a[href=‘#article-comment‘] span::text").ext
    ...: ract()

In [52]: comment_num
Out[52]: [ 7 评论]

In [53]: comment_num = response.css("a[href=‘#article-comment‘] span::text").ext
    ...: ract().strip()
---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
<ipython-input-53-18ae8761867f> in <module>()
----> 1 comment_num = response.css("a[href=‘#article-comment‘] span::text").extract().strip()

AttributeError: list object has no attribute strip

In [54]: comment_num = response.css("a[href=‘#article-comment‘] span::text").ext
    ...: ract()[0]

In [55]: comment_num
Out[55]:  7 评论

In [56]: 
View Code

 

 PS:css选择器里,不同标签使用空格隔开

 

第七篇 css选择器实现字段解析

标签:...   空格   obj   mod   标签   code   article   put   inpu   

原文地址:http://www.cnblogs.com/laonicc/p/7581463.html

(0)
(0)
   
举报
评论 一句话评论(0
登录后才能评论!
© 2014 mamicode.com 版权所有  联系我们:gaon5@hotmail.com
迷上了代码!