python requests 简单网页文本爬取

时间：2018-06-20 21:36:19 阅读：431 评论：0 收藏：0 [点我收藏+]

标签：使用 www text bs4 image from inf 文本 src

爬取网页：

http://www.cnblogs.com/xrq730/archive/2018/06/11/9159586.html

抓取的是一个博客的文本内容

用requeusts获取整个网页的HTML信息；
使用Beautiful Soup解析HTML信息

技术分享图片

 1 import requests
 2 from bs4 import BeautifulSoup
 3  
 4 
 5 if __name__==‘__main__‘:
 6     target=‘http://www.cnblogs.com/xrq730/archive/2018/06/11/9159586.html‘
 7     req=requests.get(url=target)
 8     html=req.text
 9     bf=BeautifulSoup(html)
10     texts=bf.find_all(‘div‘,class_=‘blogpost-body‘)
11     #print(html)
12     print(texts[0].text.replace(‘<p><span style=\"font-size: 14px; font-family: 宋体;\">‘,‘\n\n\t‘))
13     #print(texts[0].text.replace(‘\ax0‘*8,‘\n\n‘))

python requests 简单网页文本爬取

标签：使用 www text bs4 image from inf 文本 src

原文地址：https://www.cnblogs.com/xy-ju24/p/9204416.html

踩

(0)

评论一句话评论（0）

分享档案

更多>

2021年07月29日 (22)
2021年07月28日 (40)
2021年07月27日 (32)
2021年07月26日 (79)
2021年07月23日 (29)
2021年07月22日 (30)
2021年07月21日 (42)
2021年07月20日 (16)
2021年07月19日 (90)
2021年07月16日 (35)

周排行