码迷,mamicode.com
首页 > 编程语言 > 详细

Python爬取糗事百科示例代码

时间:2016-07-30 22:18:02      阅读:142      评论:0      收藏:0      [点我收藏+]

标签:

参考链接:http://python.jobbole.com/81351/#comment-93968

主要参考自伯乐在线的内容,但是该链接博客下的源码部分的正则表达式部分应该是有问题,试了好几次,没试成功。后来在下面的评论中看到有个使用BeautifulSoup的童鞋,试了试,感觉BeautifulSoup用起来确实很便捷。

 1 # -*- coding:utf-8 -*-
 2 
 3 ‘‘‘
 4 Author:LeonWen
 5 ‘‘‘
 6 
 7 import urllib
 8 import urllib2
 9 # import re
10 from bs4 import BeautifulSoup
11 
12 page = 1
13 url = http://www.qiushibaike.com/hot/page/ + str(page)
14 # set the headers
15 user_agent = Mozilla/4.0(compatible;MSIE 5.5;Windows NT)
16 headers = {User-Agent:user_agent}
17 try:
18     request = urllib2.Request(url,headers=headers)
19     response = urllib2.urlopen(request)
20     object_bs = BeautifulSoup(response.read())
21     # print object_bs.prettify()
22     # items 是一个list保存着返回结果
23     items = object_bs.body.find_all("div",{"class":"article block untagged mb15"})
24     # print items
25     floor = 1
26     tag = 0
27     for item in items:
28         if item.find("div",{"class":"thumb"}) == None:
29             # class=thumb为带有图片的标签
30             author = item.find("h2")
31             upNum = item.find("i",{"class":"number"})
32             content = item.find("div",{"class":"content"})
33             # print content.prettify()
34             # print content.text
35             print u"===============",floor,u" 楼 ======================="
36             print u"作者:",author.text
37             print u"赞同数:",upNum.text
38             print u"内容:",content.get_text()
39             floor += 1
40         else:
41             tag += 1
42     print u"图片个数:",tag
43 except urllib2.URLError,e:
44     if hasattr(e,"code"):
45         print e.code
46     if hasattr(e,"reason"):
47         print e.reason

 

原文地址:http://www.cnblogs.com/leonwen/p/5721843.html

Python爬取糗事百科示例代码

标签:

原文地址:http://www.cnblogs.com/leonwen/p/5721843.html

(0)
(0)
   
举报
评论 一句话评论(0
登录后才能评论!
© 2014 mamicode.com 版权所有  联系我们:gaon5@hotmail.com
迷上了代码!