python对网站的html文件进行搜寻

时间：2019-08-23 13:12:05 阅读：92 评论：0 收藏：0 [点我收藏+]

标签：import www example 输出 requests stat shu 表达对象

import requests
import bs4#导入bs4模块
res=requests.get(‘http://www.baidu.com‘)#下载这个网址，也就是说下载这个网址上的HTML
res.raise_for_status()#检查下载是否成功不成功的话就是会出错的
guoshun=bs4.BeautifulSoup(res.text,‘html.parser‘)#利用BeautifulSoup返回一个对象有了这个对象以后就可以对HTML文件进行筛选了
#有了BeautifukSoup对象之后，就可以利用它的方法，定位HTML文档中的位置
#这个模块以后的作用就像当于是正则表达式，但是要比正则表达式好用
el=guoshun.select(‘#lg‘)#BeautifulSoup的对像有一个select方法，select方法将会返回一个tag对象的列表,注意这里返回的是列表
type(el)
print(len(el))#输出列表的长度
print(el[0])#输出列表中的第一个元素
#总之就是要记住 select将会返回所有匹配到对象的一个列表
examplfile=open(‘example.html‘)
shunshun=bs4.BeautifulSoup(examplfile.read(),‘html.parser‘)#不加上html.parser这个语句就有可能会出错，这个也是根据python的出错提示写上去的
el2=shunshun.select("#author")
print(len(el2))
print(el2)
print(el2[0].getText())#getText方法的作用是显示文本，那么html中的文本是什么意思要看课本

标签：import www example 输出 requests stat shu 表达对象

原文地址：https://www.cnblogs.com/shunguo/p/11399342.html

踩

(0)

评论一句话评论（0）

分享档案

更多>

2021年07月29日 (22)
2021年07月28日 (40)
2021年07月27日 (32)
2021年07月26日 (79)
2021年07月23日 (29)
2021年07月22日 (30)
2021年07月21日 (42)
2021年07月20日 (16)
2021年07月19日 (90)
2021年07月16日 (35)

周排行