下面是Python数据处理的题目说明与要求:
The attachment is a log file used to show running status of set-top-box, and each line in the file follows the format of “LineNumber + Time + ProcessName + (ProcessID) + Logs”, currently the logs are displayed in time order. Please write one script with Python language to support the following features:
这是机顶盒运行的blog文本文件,打开后部分截图如下:
一看很乱,其实不应该用微软的txt打开,尝试用notepad++打开后,结构清楚了很多,部分截图如下:
下面给出代码:
第1题的代码如下:
#coding=utf-8
import re
f1=open(‘stblog.txt‘,‘r‘)
f2=open(‘cc1.txt‘,‘w‘)
list1=f1.readlines()
list_process=[] #定义列表存放Process
res=‘\d\D\d\d:\d\d:\d\d\.\d{3}\s([a-z]+)‘
for i in range(len(list1)):
list_process.append(re.findall(res,str(list1[i])))
for i in range(len(list_process)): #测试正则是否可行
if len(list_process[i])>1:
print ‘zheng ze fail‘
#print len(list_process)
#print len(list1)
#print list_process[141]
#print list1[141]
for m in range(len(list1)): #冒泡排序
for n in range(m+1,len(list1)):
if cmp(list_process[m],list_process[n])>0:
list_process[m],list_process[n]=list_process[n],list_process[m]
list1[m],list1[n]=list1[n],list1[m]
f2.writelines(list1)
第2,3题代码如下:
#coding=utf-8
import re
f1=open(‘stblog.txt‘,‘r‘)
f2=open(‘cc2.txt‘,‘w‘)
list1=f1.readlines()
list_process=[] #定义列表存放Process
list2=[]
count=0
res=‘\d\D\d\d:\d\d:\d\d\.\d{3}\s([a-z\.\-]+)‘
for i in range(len(list1)):
list_process.append(re.findall(res,str(list1[i])))
for i in range(len(list_process)): #测试正则是否可行
if len(list_process[i])>1:
print ‘zheng ze fail‘
s=raw_input("please input the log you interested:")
for i in range(len(list_process)):
if list_process[i]==s.split():
list2.append(list1[i]) #将对应的process行添加到cc2.txt
count+=1
print count
f2.writelines(list2)
原文地址:http://blog.csdn.net/sjtu_chenchen/article/details/46386893