python抓取百度彩票的双色球数据

时间：2014-08-31 17:04:01 阅读：664 评论：0 收藏：0 [点我收藏+]

标签：style blog http color os for 文件数据 div

　　最近在学习《机器学习实战》这本书，在学习的过程中不免要自己去实践，写些练习。这写练习的第一步就需要收集数据，所以为了写好自己的练习程序，我得先学会收集一些网络数据。了解到用python抓取网页数据的一些方法后，我就根据别人的demo，自己实践了一下，学着从百度彩票网站上抓取双色球的历史数据。以下我就介绍一下自己的小程序。

　　大致思路如下

　　　　　　找到相关url和其参数

　　　　　　找出页面上你要抓取的数据的位置，也就是说这个数据在那些标签下

　　　　　　将每页中学要的数据取下来按一定格式存放在自己本地

　　　需要的环境：

　　　　　　python

　　　　　　网页解析的库beautifulsoup

　　　具体代码如下：

 1 # -*- coding: utf-8 -*-
 2 
 3 #######################################
 4 # 获取百度彩票的双色球历史数据
 5 #######################################
 6 import urllib2
 7 from bs4 import BeautifulSoup
 8 
 9 # 创建/打开一个文件放数据
10 def  fetchLottery():
11      f = open("caipiao.txt", "a")
12      for i in range(2,15):
13         print("正在获取"+"{:0>2d}".format(i)+"年数据")
14         url = "http://baidu.lecai.com/lottery/draw/list/50?d=20"+"{:0>2d}".format(i)+"-01-01"
15         page = urllib2.urlopen(url)                                 # 打开目标url
16         soup = BeautifulSoup(page)                                  # 格式化标签
17 对象
18         for curTr in  soup.select("#draw_list tbody tr"):
19             date = curTr.select(".td1")[0].string   #开奖日期
20             ballStr = "" #彩票号码以逗号分割
21             for ball in curTr.select(".td3 .result span"):
22                 ballStr += ","
23                 ballStr += ball.contents[0].string
24             f.write(date + "\t" + ballStr[1:] + "\n")
25      print "数据抓取完成"
26      f.close()
27 
28 fetchLottery()

python抓取百度彩票的双色球数据

标签：style blog http color os for 文件数据 div

原文地址：http://www.cnblogs.com/difeng/p/3947732.html

踩

(0)

评论一句话评论（0）

分享档案

更多>

2021年07月29日 (22)
2021年07月28日 (40)
2021年07月27日 (32)
2021年07月26日 (79)
2021年07月23日 (29)
2021年07月22日 (30)
2021年07月21日 (42)
2021年07月20日 (16)
2021年07月19日 (90)
2021年07月16日 (35)

周排行