16.Python使用lxml爬虫

时间：2018-09-05 20:04:43 阅读：157 评论：0 收藏：0 [点我收藏+]

标签：int 版本字符 project 代码切换 for nbsp pythonh

1.lxml是解析库，使用时需要导入该包，直接在命令行输入：pip3 install lxml，基本上会报错。正确应该去对应的网址：https://pypi.org/project/lxml/#files，直接下载对应的lxml

（根据python版本自己去选择，笔者是python3.6，故下载：lxml-4.2.4-cp36-cp36m-win32.whl ，切换到下载的whl目录，在该目录下执行：

pip3 install lxml-4.2.4-cp36-cp36m-win32.whl ）

2.代码如下所示：

import requests
from lxml import etree

url = ‘https://www.mafengwo.cn/gonglve/ziyouxing/2033.html‘

response = requests.get(url)   #返回一个response对象
page = response.text

html = etree.HTML(page)      #返回一个Element对象，将字符串解析为HTML文档
content = html.xpath(‘//h2‘)

for i in content:
    print(i.text)

3.代码解释：

A：定义好url的路径，使用url获取到response对象如：url = ‘‘

B：需要将reponse对象转化为字符串格式，page = response.text

C：使用解析库将字符串转为为HTML文档，根据自己想要获取的内容去定义xpath路径

16.Python使用lxml爬虫

标签：int 版本字符 project 代码切换 for nbsp pythonh

原文地址：https://www.cnblogs.com/android-it/p/9593727.html

踩

(0)

评论一句话评论（0）

分享档案

更多>

2021年07月29日 (22)
2021年07月28日 (40)
2021年07月27日 (32)
2021年07月26日 (79)
2021年07月23日 (29)
2021年07月22日 (30)
2021年07月21日 (42)
2021年07月20日 (16)
2021年07月19日 (90)
2021年07月16日 (35)

周排行