标签:clean adl split 成绩 文件 unique line jaf uniq
这个教练是你的好朋友,他记录了四个人的跑步十个跑步时间在四个文件里面
james.txt ,julie.txt,mikey.txt,sarah.txt
2-34,3:21,2.34,2.45,3.01,2:01,2:01,3:10,2-22
2.59,2.11,2:11,2:23,3-10,2-23,3:10,3.21,3-21
2:22,3.01,3:01,3.02,3:02,3.02,3:22,2.49,2:38
2:58,2.58,2:39,2-25,2-55,2:54,2.18,2:55,2:55
写一个脚本为这四个文件创建一个列表并且在屏幕上面显示出来
vim chapter5-1.py
#!/usr/bin/python # -*- coding:utf-8 -*- with open(‘james.txt‘) as jaf: #打开文件 data = jaf.readline() #读取第一行 james = data.strip().split(‘,‘) #去空格以,为分割符转换成列表赋值给james with open(‘julie.txt‘) as juf: data = juf.readline() julie = data.strip().split(‘,‘) with open(‘mikey.txt‘) as mif: data = mif.readline() mikey = data.strip().split(‘,‘) with open(‘sarah.txt‘) as saf: data = saf.readline() sarah = data.strip().split(‘,‘) print (james) print (julie) print (mikey) print (sarah)
执行输出
[root@VPN chapter5]# python chapter5-1.py
[‘2-34‘, ‘3:21‘, ‘2.34‘, ‘2.45‘, ‘3.01‘, ‘2:01‘, ‘2:01‘, ‘3:10‘, ‘2-22‘]
[‘2.59‘, ‘2.11‘, ‘2:11‘, ‘2:23‘, ‘3-10‘, ‘2-23‘, ‘3:10‘, ‘3.21‘, ‘3-21‘]
[‘2:22‘, ‘3.01‘, ‘3:01‘, ‘3.02‘, ‘3:02‘, ‘3.02‘, ‘3:22‘, ‘2.49‘, ‘2:38‘]
[‘2:58‘, ‘2.58‘, ‘2:39‘, ‘2-25‘, ‘2-55‘, ‘2:54‘, ‘2.18‘, ‘2:55‘, ‘2:55‘]
下面想要把时间按升序排列
在Python里面排序有两种方法
1,原地排序
>>> data = [6,3,1,2,4,5]
>>> data.sort()
>>> data
[1, 2, 3, 4, 5, 6]
2,复制排序
>>> data
[1, 2, 3, 4, 5, 6]
>>> data = [6,3,1,2,4,5]
>>> data
[6, 3, 1, 2, 4, 5]
>>> data2 = sorted(data)
>>> data
[6, 3, 1, 2, 4, 5]
>>> data2
[1, 2, 3, 4, 5, 6]
修改以上源代码
vim chapter5-2.py
#!/usr/bin/python # -*- coding:utf-8 -*- with open(‘james.txt‘) as jaf: #打开文件 data = jaf.readline() #读取第一行 james = data.strip().split(‘,‘) #去空格以,为分割符转换成列表赋值给james with open(‘julie.txt‘) as juf: data = juf.readline() julie = data.strip().split(‘,‘) with open(‘mikey.txt‘) as mif: data = mif.readline() mikey = data.strip().split(‘,‘) with open(‘sarah.txt‘) as saf: data = saf.readline() sarah = data.strip().split(‘,‘) ‘‘‘ print (james) print (julie) print (mikey) print (sarah) ‘‘‘ print (sorted(james)) print (sorted(julie)) print (sorted(mikey)) print (sorted(sarah))
执行
[root@VPN chapter5]# python chapter5-2.py
[‘2-22‘, ‘2-34‘, ‘2.34‘, ‘2.45‘, ‘2:01‘, ‘2:01‘, ‘3.01‘, ‘3:10‘, ‘3:21‘]
[‘2-23‘, ‘2.11‘, ‘2.59‘, ‘2:11‘, ‘2:23‘, ‘3-10‘, ‘3-21‘, ‘3.21‘, ‘3:10‘]
[‘2.49‘, ‘2:22‘, ‘2:38‘, ‘3.01‘, ‘3.02‘, ‘3.02‘, ‘3:01‘, ‘3:02‘, ‘3:22‘]
[‘2-25‘, ‘2-55‘, ‘2.18‘, ‘2.58‘, ‘2:39‘, ‘2:54‘, ‘2:55‘, ‘2:55‘, ‘2:58‘]
观察排序第四行2-25排序到2.18前面了,只不符合我们的需求
是因为数据格式不统一导致分割符需要统一
光是分隔符还远远不够,因为分割后会把所有成绩成为字符串来保存,Python
可以对字符串进行排序。短横在点号前面点号在冒号前面,教练数据中的这种
不一致性导致的排序失败
下面创建一个函数名为sanitize(),这个函数从各个选手的列表接收一个字符串作为
输入,然后处理这些字符串。把所有的横线和冒号转换成点,如果已经包含点则不处理
定义函数然后使用函数处理字符串,把字符串里面包含的-和:全部转换成.
vim chapter5-3.py
#!/usr/bin/python # -*- coding:utf-8 -*- def sanitize(time_string): #定义函数 if ‘-‘ in time_string: splitter = ‘-‘ elif ‘:‘ in time_string: splitter = ‘:‘ #检查字符串是否有:和- else: return(time_string) (mins,secs) = time_string.split(splitter) #分解字符串抽出分和秒 #!/usr/bin/python # -*- coding:utf-8 -*- def sanitize(time_string): #定义函数 if ‘-‘ in time_string: splitter = ‘-‘ elif ‘:‘ in time_string: splitter = ‘:‘ #检查字符串是否有:和- else: return(time_string) (mins,secs) = time_string.split(splitter) #分解字符串抽出分和秒 return(mins + ‘.‘ + secs) with open(‘james.txt‘) as jaf: #打开文件 data = jaf.readline() #读取第一行 james = data.strip().split(‘,‘) #去空格以,为分割符转换成列表赋值给james with open(‘julie.txt‘) as juf: data = juf.readline() julie = data.strip().split(‘,‘) with open(‘mikey.txt‘) as mif: data = mif.readline() mikey = data.strip().split(‘,‘) with open(‘sarah.txt‘) as saf: data = saf.readline() sarah = data.strip().split(‘,‘) clean_james = [] clean_julie = [] clean_mikey = [] clean_sarah = [] #定义新的列表接收排序后的参数 for each_t in james: clean_james.append(sanitize(each_t)) for each_t in julie: clean_julie.append(sanitize(each_t)) for each_t in mikey: clean_mikey.append(sanitize(each_t)) for each_t in sarah: clean_sarah.append(sanitize(each_t)) print (sorted(clean_james)) print (sorted(clean_julie)) print (sorted(clean_mikey)) print (sorted(clean_sarah))
执行
[root@VPN chapter5]# python chapter5-3.py
[‘2.01‘, ‘2.01‘, ‘2.22‘, ‘2.34‘, ‘2.34‘, ‘2.45‘, ‘3.01‘, ‘3.10‘, ‘3.21‘]
[‘2.11‘, ‘2.11‘, ‘2.23‘, ‘2.23‘, ‘2.59‘, ‘3.10‘, ‘3.10‘, ‘3.21‘, ‘3.21‘]
[‘2.22‘, ‘2.38‘, ‘2.49‘, ‘3.01‘, ‘3.01‘, ‘3.02‘, ‘3.02‘, ‘3.02‘, ‘3.22‘]
[‘2.18‘, ‘2.25‘, ‘2.39‘, ‘2.54‘, ‘2.55‘, ‘2.55‘, ‘2.55‘, ‘2.58‘, ‘2.58‘]
输出了正确的排序
但是以上方法会有太多的列表以及迭代,代码会重复。Python提供了一个工具转换列表
看一下例子
>>> mins = [1,2,3]
>>> secs = [m*60 for m in mins]
>>> secs
[60, 120, 180]
分钟转换成秒
lower = ["I","am","liuyueming"]
upper = [s.upper() for s in lower]
>>> upper
[‘I‘, ‘AM‘, ‘LIUYUEMING‘]
所有字母转换成大写
修改代码改成列表推算的方法
vim chapter5-4.py
#!/usr/bin/python # -*- coding:utf-8 -*- def sanitize(time_string): #定义函数 if ‘-‘ in time_string: splitter = ‘-‘ elif ‘:‘ in time_string: splitter = ‘:‘ #检查字符串是否有:和- else: return(time_string) (mins,secs) = time_string.split(splitter) #分解字符串抽出分和秒 return(mins + ‘.‘ + secs) with open(‘james.txt‘) as jaf: #打开文件 data = jaf.readline() #读取第一行 james = data.strip().split(‘,‘) #去空格以,为分割符转换成列表赋值给james with open(‘julie.txt‘) as juf: data = juf.readline() julie = data.strip().split(‘,‘) with open(‘mikey.txt‘) as mif: data = mif.readline() mikey = data.strip().split(‘,‘) with open(‘sarah.txt‘) as saf: data = saf.readline() sarah = data.strip().split(‘,‘) clean_james = [sanitize(t) for t in james] clean_julie = [sanitize(t) for t in julie] clean_mikey = [sanitize(t) for t in mikey] clean_sarah = [sanitize(t) for t in sarah] #定义新的列表使用推导列表的方法赋值 print (sorted(clean_james)) print (sorted(clean_julie)) print (sorted(clean_mikey)) print (sorted(clean_sarah))
运行结果是一样的但是代码精简了不少
但是教练想要的结果是去除相同的数据然后取出排名前三的数据
vim chapter5-5.py
#!/usr/bin/python # -*- coding:utf-8 -*- def sanitize(time_string): #定义函数 if ‘-‘ in time_string: splitter = ‘-‘ elif ‘:‘ in time_string: splitter = ‘:‘ #检查字符串是否有:和- else: return(time_string) (mins,secs) = time_string.split(splitter) #分解字符串抽出分和秒 return(mins + ‘.‘ + secs) with open(‘james.txt‘) as jaf: #打开文件 data = jaf.readline() #读取第一行 james = data.strip().split(‘,‘) #去空格以,为分割符转换成列表赋值给james with open(‘julie.txt‘) as juf: data = juf.readline() julie = data.strip().split(‘,‘) with open(‘mikey.txt‘) as mif: data = mif.readline() mikey = data.strip().split(‘,‘) with open(‘sarah.txt‘) as saf: data = saf.readline() sarah = data.strip().split(‘,‘) clean_james = sorted([sanitize(t) for t in james]) clean_julie = sorted([sanitize(t) for t in julie]) clean_mikey = sorted([sanitize(t) for t in mikey]) clean_sarah = sorted([sanitize(t) for t in sarah]) #定义新的列表使用推导列表的方法赋值 unique_james = [] #新建列表存储去除重复数据以后的数据 for each_t in clean_james: if each_t not in unique_james: unique_james.append(each_t) #如果不在列表中追加到列表中 print (unique_james[0:3]) #输出前三 unique_julie = [] for each_t in clean_julie: if each_t not in unique_julie: unique_julie.append(each_t) print (unique_julie[0:3]) unique_mikey = [] for each_t in clean_mikey: if each_t not in unique_mikey: unique_mikey.append(each_t) print (unique_mikey[0:3]) unique_sarah = [] for each_t in clean_sarah: if each_t not in unique_sarah: unique_sarah.append(each_t) print (unique_sarah[0:3])
[root@VPN chapter5]# python chapter5-5.py
[‘2.01‘, ‘2.22‘, ‘2.34‘]
[‘2.11‘, ‘2.23‘, ‘2.59‘]
[‘2.22‘, ‘2.38‘, ‘2.49‘]
[‘2.18‘, ‘2.25‘, ‘2.39‘]
去除了排序以后重复的数据然后取出前三的数据了
这里使用了一个逻辑新建一个列表来存储去重的数据,有没有办法直接去重呢
python提供了一个内置函数集合来去重
vim chapter5-6.py
#!/usr/bin/python # -*- coding:utf-8 -*- def sanitize(time_string): #定义函数 if ‘-‘ in time_string: splitter = ‘-‘ elif ‘:‘ in time_string: splitter = ‘:‘ #检查字符串是否有:和- else: return(time_string) (mins,secs) = time_string.split(splitter) #分解字符串抽出分和秒 return(mins + ‘.‘ + secs) with open(‘james.txt‘) as jaf: #打开文件 data = jaf.readline() #读取第一行 james = data.strip().split(‘,‘) #去空格以,为分割符转换成列表赋值给james with open(‘julie.txt‘) as juf: data = juf.readline() julie = data.strip().split(‘,‘) with open(‘mikey.txt‘) as mif: data = mif.readline() mikey = data.strip().split(‘,‘) with open(‘sarah.txt‘) as saf: data = saf.readline() sarah = data.strip().split(‘,‘) print (sorted(set([sanitize(t) for t in james]))[0:3]) #输出前三 print (sorted(set([sanitize(t) for t in julie]))[0:3]) #输出前三 print (sorted(set([sanitize(t) for t in mikey]))[0:3]) #输出前三 print (sorted(set([sanitize(t) for t in sarah]))[0:3]) #输出前三
[root@VPN chapter5]# python chapter5-6.py
[‘2.01‘, ‘2.22‘, ‘2.34‘]
[‘2.11‘, ‘2.23‘, ‘2.59‘]
[‘2.22‘, ‘2.38‘, ‘2.49‘]
[‘2.18‘, ‘2.25‘, ‘2.39‘]
输出结果一样但是代码又精简了不少
使用一个函数来代替with
vim chapter5-7.py
#!/usr/bin/python # -*- coding:utf-8 -*- def sanitize(time_string): #定义函数 if ‘-‘ in time_string: splitter = ‘-‘ elif ‘:‘ in time_string: splitter = ‘:‘ #检查字符串是否有:和- else: return(time_string) (mins,secs) = time_string.split(splitter) #分解字符串抽出分和秒 return(mins + ‘.‘ + secs) def get_coach_data(filename): try: with open(filename) as f: data = f.readline() return(data.strip().split(‘,‘)) except IOError as ioerr: print(‘File error:‘ + str(ioerr)) return(None) james = get_coach_data(‘james.txt‘) julie = get_coach_data(‘julie.txt‘) mikey = get_coach_data(‘mikey.txt‘) sarah = get_coach_data(‘sarah.txt‘) print (sorted(set([sanitize(t) for t in james]))[0:3]) #输出前三 print (sorted(set([sanitize(t) for t in julie]))[0:3]) #输出前三 print (sorted(set([sanitize(t) for t in mikey]))[0:3]) #输出前三 print (sorted(set([sanitize(t) for t in sarah]))[0:3]) #输出前三
输出结果是一样的,代码又精简了不少
[root@VPN chapter5]# python chapter5-7.py
[‘2.01‘, ‘2.22‘, ‘2.34‘]
[‘2.11‘, ‘2.23‘, ‘2.59‘]
[‘2.22‘, ‘2.38‘, ‘2.49‘]
[‘2.18‘, ‘2.25‘, ‘2.39‘]
标签:clean adl split 成绩 文件 unique line jaf uniq
原文地址:http://www.cnblogs.com/minseo/p/6754547.html