标签:正则 详解 TE text 字符 pat style -- stat
强大又灵活的网页解析库,如果觉得正则表达式写起来太麻烦,而BeautifulSoup语法太难记,但是熟悉jQuery的语法,那么PyQuery就是一个绝佳选择。
安装:pip3 install pyquery
字符串初始化
from pyquery import PyQuery as pq
html = ‘‘‘
<div>
<url>
<li class=‘item-0‘>first item</li>
<li class=‘item-1‘><a href=‘link3.html‘><span class=‘bold‘>third item</span></a></li>
</url>
</div>
‘‘‘
doc = pq(html)
print(doc(‘li‘))
#这里的选择与css选择器一样,选class加点,选id加#,选标签什么都不加
输出结果为:
<li class="item-0">first item</li>
<li class="item-1"><a href="link3.html"><span class="bold">third item</span></a></li>
URL初始化
from pyquery import PyQuery as pq doc = pq(url=‘http://www.baidu.com‘) print(doc(‘head‘)) 输出结果为: <head><meta http-equiv="content-type" content="text/html;charset=utf-8"/><meta http-equiv="X-UA-Compatible" content="IE=Edge"/><meta content="always" name="referrer"/><link rel="stylesheet" type="text/css" href="http://s1.bdstatic.com/r/www/cache/bdorz/baidu.min.css"/><title>????o|??????????? ?°±??¥é??</title></head>
这种是传入一个url,会自动请求这个url,把源代码给pq,生成一个pq对象
文件初始化
from pyquery import PyQuery as pq
doc = pq(filename=‘1.html‘)
print(doc(‘url‘))
输出结果为:
<url>
<li class="item-0">first item</li>
<li class="item-1"><a href="link3.html"><span class="bold">third item</span></a></li>
</url>
------------------------
1.html内容:
<div>
<url>
<li class=‘item-0‘>first item</li>
<li class=‘item-1‘><a href=‘link3.html‘><span class=‘bold‘>third item</span></a></li>
</url>
</div>
from pyquery import PyQuery as pq
html = ‘‘‘
<div id=‘container‘>
<ul class=‘list‘>
<li class=‘item-0‘>first item</li>
<li class=‘item-1‘><a href=‘link2.html‘>second item</a></li>
<li class=‘item-0 active‘><a href=‘link3.html‘><span class=‘bold‘>third item</span></a></li>
<li class=‘item-1 active‘><a href=‘link4.html‘>fourth item</a></li>
<li class=‘item-0‘><a href=‘link5.html‘>fifth item</a></li>
</url>
</div>
‘‘‘
doc = pq(html)
print(doc(‘#container .list li‘))
输出结果为:
<li class="item-0">first item</li>
<li class="item-1"><a href="link2.html">second item</a></li>
<li class="item-0 active"><a href="link3.html"><span class="bold">third item</span></a></li>
<li class="item-1 active"><a href="link4.html">fourth item</a></li>
<li class="item-0"><a href="link5.html">fifth item</a></li>
css选择器,id前面加#号,class前面加点,标签前面什么都不加
查找子元素
查找父元素
标签:正则 详解 TE text 字符 pat style -- stat
原文地址:https://www.cnblogs.com/ronghe/p/9190630.html