py 爬取汽车之家新闻案例

时间：2019-11-17 22:19:39 阅读：100 评论：0 收藏：0 [点我收藏+]

标签：requests import continue lis not gbk port url com

``` import requests from bs4 import BeautifulSoup response = requests.get("https://www.autohome.com.cn/news/") # 1. content /text 的区别 # print(response.content) # content 拿到的字节 response.encoding = ‘gbk‘ # print(response.text) # text 拿到的文本信息 soup = BeautifulSoup(response.text,‘html.parser‘) # tag = soup.find(id=‘auto-channel-lazyload-article‘) # 找唯一的值,缩小范围 # h3 = tag.find(name=‘h3‘,class_ =‘‘) # class是关键词所以要加下划线, 或者使用下面的方式 # h3 = tag.find(name=‘h3‘,attrs= {‘class‘:‘‘}) # # print(h3) # 链式写法 li_list = soup.find(id=‘auto-channel-lazyload-article‘).find_all(name=‘li‘) for li in li_list: title = li.find(‘h3‘) #获取标题 if not title:# 如果为null,跳出 continue title = title.text summary = li.find("p").text url = li.find("a").get(‘href‘) img = li.find("img").get(‘src‘) print(img) # 保存图片 res = requests.get(img) file_name = "%s.jpg"%(title,) with open(file_name,‘wb‘) as f: f.write(res.content) #保存字节内容 ``` > 更多精彩文章请关注 [王明昌博客](https://www.wangmingchang.com)

py 爬取汽车之家新闻案例

标签：requests import continue lis not gbk port url com

原文地址：https://www.cnblogs.com/wmc1125/p/11878675.html

踩

(0)

评论一句话评论（0）

分享档案

更多>

2021年07月29日 (22)
2021年07月28日 (40)
2021年07月27日 (32)
2021年07月26日 (79)
2021年07月23日 (29)
2021年07月22日 (30)
2021年07月21日 (42)
2021年07月20日 (16)
2021年07月19日 (90)
2021年07月16日 (35)

周排行