python爬取网页内容demo

时间：2018-09-15 12:21:57 阅读：200 评论：0 收藏：0 [点我收藏+]

标签：parser dem this exce dataframe class note pre sts

 1 #html文本提取
 2 from bs4 import BeautifulSoup
 3 html_sample = ‘ 4 <html>  5 <body>  6 <h1 id = "title">Hello world</h1> 7 <a href = "#www.baidu.com" class = "link"> This is link1</a> 8 <a href = "#link2" class = "link"> This is link2</a>  9 </body> 10 </html>‘
11 soup = BeautifulSoup(html_sample,‘html.parser‘)
12 print(soup.text)
13 soup.select(‘h1‘)
14 print(soup.select(‘h1‘)[0].text)
15 print(soup.select(‘a‘)[0].text)
16 print(soup.select(‘a‘)[1].text)
17 
18 for alink in soup.select(‘a‘):
19     print(alink.text)
20 
21 print(soup.select(‘#title‘)[0].text)
22 print(soup.select(‘.link‘)[0].text)
23 
24 alinks = soup.select(‘a‘)
25 for link in alinks:
26     print(link[‘href‘])

demo2:

 1 import requests
 2 from bs4 import BeautifulSoup
 3 res = requests.get(‘http://news.qq.com/‘)
 4 soup = BeautifulSoup(res.text,‘html.parser‘)
 5 newsary = []
 6 for news in soup.select(‘.Q-tpWrap .text‘):
 7     newsary.append({‘title‘:news.select(‘a‘)[0].text, ‘url‘:news.select(‘a‘)[0][‘href‘]})
 8 
 9 import pandas 
10 newsdf = pandas.DataFrame(newsary)
11 newsdf.to_excel(‘news.xlsx‘)

推荐使用：Jupyter Notebook 做练习，很方便。

python爬取网页内容demo

标签：parser dem this exce dataframe class note pre sts

原文地址：https://www.cnblogs.com/hujianglang/p/9650329.html

踩

(0)

评论一句话评论（0）

分享档案

更多>

2021年07月29日 (22)
2021年07月28日 (40)
2021年07月27日 (32)
2021年07月26日 (79)
2021年07月23日 (29)
2021年07月22日 (30)
2021年07月21日 (42)
2021年07月20日 (16)
2021年07月19日 (90)
2021年07月16日 (35)

周排行