标签:code items blog int log 数字 sort img png
1.读入待分析的字符串
fo=open(‘test.txt‘,‘r‘) #读入待分析的字符串 str=fo.read() fo.close()
2.分解提取单词
for i in ‘,.?!"\n--‘: #分解提取单词 str=str.replace(i,‘ ‘) words=str.split(‘ ‘)
3.计数字典
for i in keys: #创建计数字典 dict[i]=words.count(i)
4.排除语法型词汇
dict={} #排除语法型词汇 ecp=set([‘‘,‘a‘,‘an‘,‘the‘,‘and‘,‘to‘,‘in‘,‘on‘,‘of‘,‘for‘,‘i‘,‘our‘,‘us‘,‘into‘,‘her‘,‘we‘, ‘when‘,‘their‘,‘my‘,‘from‘,‘them‘,‘with‘,‘after‘,‘would‘,‘was‘,‘had‘,‘that‘,‘while‘, ‘his‘,‘she‘,‘up‘,‘it‘,‘they‘,‘so‘,‘by‘])
5.排序
items.sort(key=lambda x:x[1],reverse=True)
6.输出TOP(20)
print(‘输出TOP20:‘) for i in range(20): print(items[i])
肖申克的救赎
标签:code items blog int log 数字 sort img png
原文地址:http://www.cnblogs.com/1031353319qq/p/7603082.html