python接口自动化--lxml解析

时间：2019-08-07 17:31:35 阅读：118 评论：0 收藏：0 [点我收藏+]

标签：urllib zip new class type bsp 元素 tostring html

 1 from lxml import etree
 2 import urllib3
 3 import requests
 4 urllib3.disable_warnings()
 5 url="https://www.cnblogs.com/mvc/blog/news.aspx?blogApp=xiaoyujuan"
 6 
 7 r = requests.get(url,verify=False)
 8 # print(r.text)
 9 
10 dom = etree.HTML(r.content.decode("utf-8"))
11 block = dom.xpath("//*[@id=‘profile_block‘]")
12 t = etree.tostring(block[0],encoding=‘utf-8‘,pretty_print=True)
13 print(t.decode("utf-8"))
14 
15 t1 = block[0].xpath("text()")#获取当前节点文本元素
16 print(t1)
17 t2 = block[0].xpath(‘a‘)#定位a标签
18 for i,j in zip(t1,t2):
19     print("%s%s" %(i,j.text))

 1 from lxml import etree
 2 htmldemo = ‘‘‘ 
 3 <meta charset="UTF-8"> <!-- for HTML5 -->
 4 <meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
 5 <html><head><title>yoyo ketang</title></head><body><b><!--Hey, this in comment!--></b>
 6 <p class="title"><b>yoyoketang</b></p><p class="yoyo">这里是我的微信公众号：yoyoketang <br>
 7 <a href="http://www.cnblogs.com/yoyoketang/tag/fiddler/" class="sister" id="link1">fiddler教程</a><br>
 8 <a href="http://www.cnblogs.com/yoyoketang/tag/python/" class="sister" id="link2">python笔记</a><br>
 9 <a href="http://www.cnblogs.com/yoyoketang/tag/selenium/" class="sister" id="link3">selenium文档</a><br>
10 快来关注吧！</p>
11 <p class="story">...</p>
12 ‘‘‘
13 #etree.HTMLz解析html内容
14 demo = etree.HTML(htmldemo)
15 #打印解析之后的html内容，可用etree.tosting方法
16 #encoding="utf-8"参数可以正常输出html里面的中文内容
17 #pretty_print=True是以标准格式输出
18 t = etree.tostring(demo,encoding=‘utf-8‘,pretty_print=True)
19 print(t.decode(‘utf-8‘))

python接口自动化--lxml解析

标签：urllib zip new class type bsp 元素 tostring html

原文地址：https://www.cnblogs.com/xiaoyujuan/p/11304355.html

踩

(0)

评论一句话评论（0）

分享档案

更多>

2021年07月29日 (22)
2021年07月28日 (40)
2021年07月27日 (32)
2021年07月26日 (79)
2021年07月23日 (29)
2021年07月22日 (30)
2021年07月21日 (42)
2021年07月20日 (16)
2021年07月19日 (90)
2021年07月16日 (35)

周排行