python爬取糗百第一页的笑话

时间：2016-10-28 22:31:08 阅读：204 评论：0 收藏：0 [点我收藏+]

自学python网络爬虫，发现request比urllib还是要好用一些，因此利用request和BeautifulSoup来实现糗百的首页笑话的抓取。
BeautifulSoup通过find和findAll以及利用正则表达式实现HTML对应模块的抓取，当然select也是一个不错的选择。
下面是临时的代码，后续还会继续完善。

 1 # coding=utf-8
 2 import requests
 3 from bs4 import BeautifulSoup
 4 
 5 page = 1
 6 url = ‘http://www.qiushibaike.com/hot/page/‘ + str(page)
 7 try:
 8     res=requests.get(url)
 9     # print res.text # 如果请求成功，下载的网页就作为一个字符串，保存在相应的text变量中，这就是为什么用res.text。
10 except Exception as e:
11     print ‘打开网页出现异常：‘,e
12 
13 try:
14     soup=BeautifulSoup(res.text,‘html.parser‘)
15     elms=soup.select(‘.content‘) #这里产生一个列表
16     for elm in elms:
17         print elm.text
18 except Exception as e:
19     print ‘解析出现异常：‘,e

python爬取糗百第一页的笑话

标签：代码 odi 为什么 dal res html 解析变量表达

原文地址：http://www.cnblogs.com/carpenterworm/p/6009357.html

踩

(0)

评论一句话评论（0）

分享档案

更多>

2021年07月29日 (22)
2021年07月28日 (40)
2021年07月27日 (32)
2021年07月26日 (79)
2021年07月23日 (29)
2021年07月22日 (30)
2021年07月21日 (42)
2021年07月20日 (16)
2021年07月19日 (90)
2021年07月16日 (35)

周排行