爬取百度热搜榜前十

时间：2020-03-16 16:23:23 阅读：77 评论：0 收藏：0 [点我收藏+]

标签：soup lin except ext 内容 htm alt col app

1.导入相应的库

2.找到要爬取的网站：http://top.baidu.com/buzz?b=341&c=513&fr=topbuzz_b341_c513

3.找到爬取的内容：技术图片

4.用for循环将需要的内容添加到空列表中，在使用DataFrame打印出热搜榜前十

import requests
from bs4 import BeautifulSoup
import bs4
import pandas as pd
url = ‘http://top.baidu.com/buzz?b=341&c=513&fr=topbuzz_b341_c513‘
def f(s):
    try:
        headers = {‘User-Agent‘:‘Mozilla/5.0 (Windows NT 6.3; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/69.0.3497.100 Safari/537.36‘}
        r=requests.get(s,timeout=30,headers=headers)
        r.raise_for_status()
        r.encoding=r.apparent_encoding
        soup=BeautifulSoup(r.text,‘lxml‘)
        return soup
    except:
        return ""
soup=f(url)
a=[]
b=[]
for link1 in soup.find_all(class_=‘list-title‘): 
    a.append(link1.get_text())
for link2 in soup.find_all(‘td‘,class_=‘last‘):
    b.append(link2.get_text().strip())
data=pd.DataFrame([a,b],index=["关键词","搜索指数"]).T
print("爬取百度热搜榜前十:","\n")
print(data.iloc[0:10])

爬取百度热搜榜前十

标签：soup lin except ext 内容 htm alt col app

原文地址：https://www.cnblogs.com/lzq129/p/12504595.html

踩

(0)

评论一句话评论（0）

分享档案

更多>

2021年07月29日 (22)
2021年07月28日 (40)
2021年07月27日 (32)
2021年07月26日 (79)
2021年07月23日 (29)
2021年07月22日 (30)
2021年07月21日 (42)
2021年07月20日 (16)
2021年07月19日 (90)
2021年07月16日 (35)

周排行