标签:标签 port rap lin 解析 attr rom child 选择器
PyQuery库也是一个非常强大又灵活的网页解析库。
官网地址:http://pyquery.readthedocs.io/en/latest/
html = ‘‘‘ <div> <ul> <li class="item-0">first item<lli> <li class="item-1"><a href="link2.html">second item</a><lli> <li class="item-0 active"><a href="link3.html"><span class="bold">third item</span></a></li> <li class ="item-1 active"><a href="link4 . html">fourth item</a></li> <li class="item-0"><a href="link5.html">fifth item</a></li> </ul> </div> ‘‘‘
# 字符串初始化 from pyquery import PyQuery as pq html = "" doc = pd(html) print(doc(‘li‘)) # URL初始化 from pyquery import PyQuery as pq html = "" doc = pq(url=‘ https://cuiqingcai.com’) print(doc(’title‘)) # 文件初始化 from pyquery import PyQuery as pq html = "" doc = pq(filename=’demo.html’) print(doc(’li’))
from pyquery import PyQuery as pq doc = pd(html) # 子元素 items = doc(‘.list‘) lis = items.find(‘li‘) lis = items.children() lis = items.children(‘.active‘) print(lis) # 父元素 items = doc(‘.list‘) container =items.parents() print(container) parent = items.parents(‘.wrap‘) print(parent) # 兄弟元素 li = doc(‘.list.item-0.active‘) print(li.siblings()) print(li.siblings(‘.active‘))
from pyquery import PyQuery as pq doc = pd(html) a = doc(‘.item-0.active a‘) print(a) print(a.attr.href) print(a.attr(‘href‘)
from pyquery import PyQuery as pq doc = pd(html) a = doc(‘.item-0.active a‘) print(a) print(a.text())
from pyquery import PyQuery as pq doc = pd(html) li = doc(‘.item-0.active‘) print(li) print(li.html())
标签:标签 port rap lin 解析 attr rom child 选择器
原文地址:https://www.cnblogs.com/Iceredtea/p/11294266.html