python小白学习记录运用lxml的xpath解析html文件

时间：2020-02-09 18:26:26 阅读：77 评论：0 收藏：0 [点我收藏+]

标签：targe XML ext print path 标签 name www 指定

 1 from lxml import etree
 2 text = "<div><p>nmsl</p><span>nmsl</span></div>"
 3 def htmlstree(text):
 4     html = etree.HTML(text)
 5     result = etree.tostring(html)
 6     print(result)
 7     return result.decode(‘utf-8‘)
 8 #解析html字符串并且会为标签自动加上<html><body></body></html>
 9 def parseetree():
10     parser = etree.HTMLParser(encoding=‘utf-8‘)
11     html = etree.parse("index111.html",parser=parser)
12     result = etree.tostring(html,encoding=‘utf-8‘).decode("utf-8")
13     print(result)
14 #解析xml 由于某写html标签会不全用普通的xml解析器会出错 如<br/>  所以要指定html解析器
15 if __name__ == ‘__main__‘:
16     parseetree()

以上为etree的使用范例

分别解析了html字符串和html文件

from lxml import etree
def parseetree():
    parser = etree.HTMLParser(encoding=‘utf-8‘)
    html = etree.parse("index111.html",parser=parser)
    trs = html.xpath("//a[@onclick][@id]")
    for tr in trs:
        result = etree.tostring(tr,encoding=‘utf-8‘).decode("utf-8")
        print(result)
def parseetree1():
    parser = etree.HTMLParser(encoding=‘utf-8‘)
    html = etree.parse("index111.html",parser=parser)
    tr = html.xpath("//a[@onclick][@id]")[3]
    result = etree.tostring(tr,encoding=‘utf-8‘).decode("utf-8")
    print(result)
if __name__ == "__main__":
    parseetree()
    print("***************")
    parseetree1()

以上为运用xpath来对html进行解析

以下是运行结果

技术图片

附：https://www.w3school.com.cn/xpath/xpath_syntax.asp xpath语法

python小白学习记录运用lxml的xpath解析html文件

标签：targe XML ext print path 标签 name www 指定

原文地址：https://www.cnblogs.com/jswf/p/12287819.html

踩

(0)

评论一句话评论（0）

分享档案

更多>

2021年07月29日 (22)
2021年07月28日 (40)
2021年07月27日 (32)
2021年07月26日 (79)
2021年07月23日 (29)
2021年07月22日 (30)
2021年07月21日 (42)
2021年07月20日 (16)
2021年07月19日 (90)
2021年07月16日 (35)

周排行

python小白学习记录 运用lxml的xpath解析html文件

python小白学习记录运用lxml的xpath解析html文件