标签:res img list .text tin lis new url 爬取
import requests from bs4 import BeautifulSoup response = requests.get(‘https://www.autohome.com.cn/news/‘) response.encoding = ‘gbk‘ soup = BeautifulSoup(response.text,"html.parser") div =soup.find(name=‘div‘,id=‘auto-channel-lazyload-article‘) li_list = div.find_all(name=‘li‘) for li in li_list: h3 = li.find(name=‘h3‘) a = li.find(name=‘a‘) p =li.find(name=‘p‘) img = li.find(name=‘img‘) if not h3: continue print(h3.text) print(a.attrs[‘href‘]) print(p.text) img_url = ‘https:‘+ img.attrs[‘src‘] img_response = requests.get(img_url) file_name = img_url.rsplit(‘/‘,maxsplit=1)[1] with open(file_name,‘wb‘) as f: f.write(img_url.content) print(‘======================‘)
标签:res img list .text tin lis new url 爬取
原文地址:https://www.cnblogs.com/zhanglin123/p/9283361.html