爬取凤凰客户端的FUN段子

时间：2015-01-03 23:50:01 阅读：189 评论：0 收藏：0 [点我收藏+]

标签：

比较喜欢凤凰新闻客户端的FUN系列文章，所以就写了Python程序来下载所有这么段子的地址。

下面程序只是下载第一页的所有文章的url,程序修改一下，就可以爬取所有的文章。

#!/usr/bin/python
#-*-coding:utf-8 -*-

import requests
import json
import re
headers={"Host":‘i.ifeng.com‘,
        "User-Agent":"Mozilla/5.0 (Windows NT 6.1; WOW64; rv:34.0) Gecko/20100101 Firefox/34.0",
        "Accept":"*/*",
        "Accept-Language":"zh-cn,zh;q=0.8,en-us;q=0.5,en;q=0.3",
        "Content-Type":"application/x-www-form-urlencoded",
        "X-Requested-With":"XMLHttpRequest",
        "Referer":"http://i.ifeng.com/news/djch/fun/dir?vt=5&cid=17899&mid=64DflZ"}

rsp=requests.get("http://i.ifeng.com/news/djch/fun/ajaxlist.php?p=1&cid=17899&htmltype=dir",timeout=5,headers=headers)
html=rsp.json()
pattern=re.compile(r‘aid=(.+?)&‘)
aid_list=re.findall(pattern,str(html))
for aid in aid_list:
        url="http://i.ifeng.com/news/djch/fun/news?vt=5&cid=0&aid=%s&mid=64DflZ&all=1&p=2"%aid
        print url

爬取凤凰客户端的FUN段子

标签：

原文地址：http://www.cnblogs.com/tmyyss/p/4200058.html

踩

(0)

评论一句话评论（0）

分享档案

更多>

2021年07月29日 (22)
2021年07月28日 (40)
2021年07月27日 (32)
2021年07月26日 (79)
2021年07月23日 (29)
2021年07月22日 (30)
2021年07月21日 (42)
2021年07月20日 (16)
2021年07月19日 (90)
2021年07月16日 (35)

周排行