Python爬取糗事百科示例代码

时间：2016-07-30 22:18:02 阅读：142 评论：0 收藏：0 [点我收藏+]

标签：

参考链接：http://python.jobbole.com/81351/#comment-93968

主要参考自伯乐在线的内容，但是该链接博客下的源码部分的正则表达式部分应该是有问题，试了好几次，没试成功。后来在下面的评论中看到有个使用BeautifulSoup的童鞋，试了试，感觉BeautifulSoup用起来确实很便捷。

 1 # -*- coding:utf-8 -*-
 2 
 3 ‘‘‘
 4 Author:LeonWen
 5 ‘‘‘
 6 
 7 import urllib
 8 import urllib2
 9 # import re
10 from bs4 import BeautifulSoup
11 
12 page = 1
13 url = ‘http://www.qiushibaike.com/hot/page/‘ + str(page)
14 # set the headers
15 user_agent = ‘Mozilla/4.0(compatible;MSIE 5.5;Windows NT)‘
16 headers = {‘User-Agent‘:user_agent}
17 try:
18     request = urllib2.Request(url,headers=headers)
19     response = urllib2.urlopen(request)
20     object_bs = BeautifulSoup(response.read())
21     # print object_bs.prettify()
22     # items 是一个list保存着返回结果
23     items = object_bs.body.find_all("div",{"class":"article block untagged mb15"})
24     # print items
25     floor = 1
26     tag = 0
27     for item in items:
28         if item.find("div",{"class":"thumb"}) == None:
29             # class=thumb为带有图片的标签
30             author = item.find("h2")
31             upNum = item.find("i",{"class":"number"})
32             content = item.find("div",{"class":"content"})
33             # print content.prettify()
34             # print content.text
35             print u"===============",floor,u" 楼 ======================="
36             print u"作者:",author.text
37             print u"赞同数:",upNum.text
38             print u"内容:",content.get_text()
39             floor += 1
40         else:
41             tag += 1
42     print u"图片个数:",tag
43 except urllib2.URLError,e:
44     if hasattr(e,"code"):
45         print e.code
46     if hasattr(e,"reason"):
47         print e.reason

原文地址：http://www.cnblogs.com/leonwen/p/5721843.html

Python爬取糗事百科示例代码

标签：

原文地址：http://www.cnblogs.com/leonwen/p/5721843.html

踩

(0)

评论一句话评论（0）

分享档案

更多>

2021年07月29日 (22)
2021年07月28日 (40)
2021年07月27日 (32)
2021年07月26日 (79)
2021年07月23日 (29)
2021年07月22日 (30)
2021年07月21日 (42)
2021年07月20日 (16)
2021年07月19日 (90)
2021年07月16日 (35)

周排行