python爬虫，使用BeautifulSoup模块爬取人民网新链接，标题，时间

时间：2020-03-21 18:29:37 阅读：75 评论：0 收藏：0 [点我收藏+]

import requests
from bs4 import BeautifulSoup as bs
res = requests.get(‘http://politics.people.com.cn/GB/1024/index.html‘)
content = res.content.decode(‘GB2312‘)
soup = bs(content, ‘html5lib‘)
myList = soup.find_all(‘li‘)

myNews = {}
for i in myList:
    myNews = {}
    myNews[‘title‘] = i.find(‘a‘).get_text()
    myNews[‘link‘] = i.find(‘a‘).get(‘href‘)
    myNews[‘time‘] = i.find(‘em‘).get_text()
    print(myNews)

　　运行结果显示：

{‘title‘: ‘千方百计加快恢复和稳定就业为就业创业、灵活就业提供更多机会‘, ‘link‘: ‘/n1/2020/0321/c1024-31642187.html‘, ‘time‘: ‘2020-03-21‘}
{‘title‘: ‘在精准防控疫情的同时积极有序推进复工复产稳住和支持市场主体增强经济回升动力‘, ‘link‘: ‘/n1/2020/0321/c1024-31642183.html‘, ‘time‘: ‘2020-03-21‘}
{‘title‘: ‘李克强：在精准防控疫情的同时积极有序推进复工复产稳住和支持市场主体增强经济回升动力‘, ‘link‘: ‘/n1/2020/0320/c1024-31642058.html‘, ‘time‘: ‘2020-03-20‘}

通过和上一篇博客使用正则表达式抓取人民网新闻相比较，对于简单的网页抓取，使用正则表达式获取网页信息更简单，快捷。

python爬虫，使用BeautifulSoup模块爬取人民网新链接，标题，时间

标签：有序 html news 创业爬取 lin 灵活 lib div

原文地址：https://www.cnblogs.com/iceberg710815/p/12540424.html

踩

(0)

评论一句话评论（0）

分享档案

更多>

2021年07月29日 (22)
2021年07月28日 (40)
2021年07月27日 (32)
2021年07月26日 (79)
2021年07月23日 (29)
2021年07月22日 (30)
2021年07月21日 (42)
2021年07月20日 (16)
2021年07月19日 (90)
2021年07月16日 (35)

周排行