码迷,mamicode.com
首页 > 编程语言 > 详细

python爬虫之PyQuery的基本使用

时间:2017-10-16 13:51:51      阅读:208      评论:0      收藏:0      [点我收藏+]

标签:字符串   官网   nbsp   class   pre   one   ges   color   cond   

PyQuery库也是一个非常强大又灵活的网页解析库,如果你有前端开发经验的,都应该接触过jQuery,那么PyQuery就是你非常绝佳的选择,PyQuery 是 Python 仿照 jQuery 的严格实现。语法与 jQuery 几乎完全相同,所以不用再去费心去记一些奇怪的方法了。
官网地址:http://pyquery.readthedocs.io/en/latest/
jQuery参考文档: http://jquery.cuishifeng.cn/

 

1、字符串的初始化

from pyquery import PyQuery as pq

html = ‘‘‘<div>
    <ul>
         <li class="item-0">first item</li>
         <li class="item-1"><a href="link2.html">second item</a></li>
         <li class="item-0 active"><a href="link3.html"><span class="bold">third item</span></a></li>
         <li class="item-1 active"><a href="link4.html">fourth item</a></li>
         <li class="item-0"><a href="link5.html">fifth item</a></li>
     </ul></div>‘‘‘

doc = pq(html)
print(doc)
print(type(doc))
print(doc(‘li‘))
技术分享
<div>
    <ul>
         <li class="item-0">first item</li>
         <li class="item-1"><a href="link2.html">second item</a></li>
         <li class="item-0 active"><a href="link3.html"><span class="bold">third item</span></a></li>
         <li class="item-1 active"><a href="link4.html">fourth item</a></li>
         <li class="item-0"><a href="link5.html">fifth item</a></li>
     </ul></div>
<class pyquery.pyquery.PyQuery>
<li class="item-0">first item</li>
         <li class="item-1"><a href="link2.html">second item</a></li>
         <li class="item-0 active"><a href="link3.html"><span class="bold">third item</span></a></li>
         <li class="item-1 active"><a href="link4.html">fourth item</a></li>
         <li class="item-0"><a href="link5.html">fifth item</a></li>
运行结果

 

2、打开html文件

  注意路劲问题

from pyquery import PyQuery as pq
doc = pq(filename=‘index.html‘)
print(doc)
print(doc(‘head‘))
技术分享
    <title>Title</title>
</head>
<body>
    <div>
    <ul>
         <li class="item-0">first item</li>
         <li class="item-1"><a href="link2.html">second item</a></li>
         <li class="item-0 active"><a href="link3.html"><span class="bold">third item</span></a></li>
         <li class="item-1 active"><a href="link4.html">fourth item</a></li>
         <li class="item-0"><a href="link5.html">fifth item</a></li>
     </ul></div>‘‘‘
</body>
</html>
<head>
    <meta charset="UTF-8"/>
    <title>Title</title>
</head>
运行结果

 

3、打开某个网站

doc = pq(‘https://www.baidu.com‘)
# doc1 = pq(url=‘https://www.baidu.com‘)
print(doc)
print(doc(‘head‘))

  

4、基于CSS选择器查找

from pyquery import PyQuery as pq

html = ‘‘‘<div>
    <ul id = ‘haha‘>
         <li class="item-0">first item</li>
         <li class="item-1"><a href="link2.html">second item</a></li>
         <li class="item-0 active"><a href="link3.html"><span class="bold">third item</span></a></li>
         <li class="item-1 active"><a href="link4.html">fourth item</a></li>
         <li class="item-0"><a href="link5.html">fifth item</a></li>
     </ul></div>‘‘‘

doc = pq(html)
print(doc)
#id等于haha下面的class等于item-0下的a标签下的span标签(注意层级关系以空格隔开)
print(doc(‘#haha .item-0 a span‘))
技术分享
<div>
    <ul id="haha">
         <li class="item-0">first item</li>
         <li class="item-1"><a href="link2.html">second item</a></li>
         <li class="item-0 active"><a href="link3.html"><span class="bold">third item</span></a></li>
         <li class="item-1 active"><a href="link4.html">fourth item</a></li>
         <li class="item-0"><a href="link5.html">fifth item</a></li>
     </ul></div>
<span class="bold">third item</span>
运行结果

 

技术分享

 

python爬虫之PyQuery的基本使用

标签:字符串   官网   nbsp   class   pre   one   ges   color   cond   

原文地址:http://www.cnblogs.com/lei0213/p/7676254.html

(0)
(0)
   
举报
评论 一句话评论(0
登录后才能评论!
© 2014 mamicode.com 版权所有  联系我们:gaon5@hotmail.com
迷上了代码!