爬虫实战1--抓取糗事百科段子

时间：2017-08-17 13:02:49 阅读：142 评论：0 收藏：0 [点我收藏+]

标签：爬虫

1.提取某一页的所有段子


# -*- coding:utf-8 -*-
import urllib
import urllib2
import re

page = 1
url = ‘http://www.qiushibaike.com/hot/page/‘ + str(page)
user_agent = ‘haha/4.0 (compatible; MSIE 5.5; Windows NT)‘
headers = { ‘User-Agent‘ : user_agent }
try:
    request = urllib2.Request(url,headers = headers)
    response = urllib2.urlopen(request)
    content = response.read().decode(‘utf-8‘)
    pattern=re.compile(‘h2>(.*?)</h2.*?<span>(.*?)</.*?number">(.*?)</.*?number">(.*?)<‘, re.S)
    items = re.findall(pattern,content)
    for item in items:
        print u"----------------------------------------\n发布人：%s内容：%s赞：%s\t评论数：%s\n"%(item[0],item[1],item[2],item[3])

except urllib2.URLError, e:
    if hasattr(e,"code"):
        print e.code
    if hasattr(e,"reason"):
        print e.reason

爬虫实战1--抓取糗事百科段子

标签：爬虫

原文地址：http://12129857.blog.51cto.com/12119857/1956967

踩

(0)

评论一句话评论（0）

分享档案

更多>

2021年07月29日 (22)
2021年07月28日 (40)
2021年07月27日 (32)
2021年07月26日 (79)
2021年07月23日 (29)
2021年07月22日 (30)
2021年07月21日 (42)
2021年07月20日 (16)
2021年07月19日 (90)
2021年07月16日 (35)

周排行