标签:打印 htm 一个 link first port 方式 decode href
from lxml import etree
text = ‘‘‘
<div>
<ul>
<li class="item-0"><a href="link1.html">first item</a></li>
<li class="item-1"><a href="link2.html">second item</a></li>
<li class="item-inactive"><a href="link3.html">third item</a></li>
<li class="item-1"><a href="link4.html">fourth item</a></li>
<li class="item-0"><a href="link5.html">fifth item</a>
</ul>
</div>
‘‘‘
html = etree.HTML(text)#构造了一个XPath解析对象并对HTML文本进行自动修正
result = etree.tostring(html)#输出修正后的结果,类型是bytes
print(result.decode(‘utf-8‘))#以utf8的编码的方式打印修正后的内容
# 修正后的内容
test_data=‘‘‘<html><body><div>
<ul>
<li class="item-0"><a href="link1.html">first item</a></li>
<li class="item-1"><a href="link2.html">second item</a></li>
<li class="item-inactive"><a href="link3.html">third item</a></li>
<li class="item-1"><a href="link4.html">fourth item</a></li>
<li class="item-0"><a href="link5.html">fifth item</a>
</li></ul>
</div>
</body></html>‘‘‘
标签:打印 htm 一个 link first port 方式 decode href
原文地址:https://www.cnblogs.com/liangliangzz/p/10175622.html