码迷,mamicode.com
首页 > 编程语言 > 详细

Python实现随机读取文本N行数据

时间:2014-10-13 17:10:39      阅读:233      评论:0      收藏:0      [点我收藏+]

标签:des   style   blog   http   color   io   os   使用   ar   

工作中需要判断某个文本中的URL是否能正常访问,并且随机获取其中N行能正常访问的URL数据,我的思路是:读取文本每一行数据,用urlopen访问,将返回状态码为200的URL保存到一个列表,获得列表长度,使用random产生一个随机值作为列表下标,获取该行数据。具体实现如下:

 1 import urllib2,random
 2 from sets import Set
 3 
 4 def get_responses(url):
 5     global good_list
 6     global bad_list
 7     if not url.startswith("http:"):
 8         http_url = "http://" + url
 9     headers = {User-Agent: Mozilla/5.0 (Windows NT 5.1; rv:10.0.1) Gecko/20100101 Firefox/10.0.1,}
10     try:
11         request = urllib2.Request(http_url, headers=headers)
12         resp = urllib2.urlopen(request)
13     print url
14     except urllib2.URLError, e:
15         print e
16         bad_list.append(url)
17         return 0
18 
19     retcode = resp.getcode()
20     if retcode == 200:
21         good_list.append(url)
22         #return 1
23     else:
24         bad_list.append(url)
25         #return 0
26 
27 def readFile():
28     try:
29         urllist = open(rC:\Users\888\Desktop\urls.txt,r)
30     except IOError:
31         print "file does not exist.\n"
32     for item in urllist:
33         item = item.strip(\n)
34         r = get_responses(item)
35 
36     urllist.close()
37     print "Total URLs: %d, Good URLs:%d, Bad URLs: %d." %((len(good_list)+len(bad_list)),len(good_list),len(bad_list))
38     
39 def writeFile(linenum):
40     result = []
41     linelen = len(good_list)
42     while len(Set(result)) < int(linenum):
43         s = random.randint(0,linelen-1)
44         result.append(good_list[s])
45         
46     # Put the good_url in goodurl.txt file
47     try:
48         goodurl = open(rC:\Users\888\Desktop\goodurl.txt,w+)
49     except IOError:
50         print "file does not exist.\n"
51 
52     for item in result:
53         goodurl.write(item+\n)
54     goodurl.close()
55 
56     print "The mission is done, Please check the goodurl.txt file"
57     
58 if __name__ == "__main__":  
59     good_list = []
60     bad_list = []
61     readFile()
62     writeFile(100)

 

Python实现随机读取文本N行数据

标签:des   style   blog   http   color   io   os   使用   ar   

原文地址:http://www.cnblogs.com/goodhacker/p/4022189.html

(0)
(0)
   
举报
评论 一句话评论(0
登录后才能评论!
© 2014 mamicode.com 版权所有  联系我们:gaon5@hotmail.com
迷上了代码!