码迷,mamicode.com
首页 > 其他好文 > 详细

文件方式实现完整的英文词频统计实例

时间:2017-09-26 22:18:55      阅读:168      评论:0      收藏:0      [点我收藏+]

标签:think   rom   read   词汇   top   body   排序   tool   alt   

1.读入待分析的字符串

2.分解提取单词 

3.计数字典

4.排除语法型词汇

5.排序

6.输出TOP(20)

技术分享
fo=open(‘dead romance.txt‘,‘w‘)
fo.write(‘‘‘in a rainy night
can you hear me
in a rainy night
can you help me
man,what are you thinking of 
man,what do you need
man,nobody tell you what to do
man ,you need somebody to hurt
in a rainy night
can you hear me
in a rainy night
can you help me
man,you feel so lonely
man,can you hear the message come from the sky
man,you are driving into the rain
man,you know it‘s time to find the prey
in a rainy night
can you hear me
in a rainy night
can you help me‘‘‘)
fo.close()

fo=open(‘dead romance.txt‘,‘r‘)
A= fo.read()
exc={‘the‘,‘and‘,‘to‘,‘of‘,‘in‘,‘a‘,‘for‘,‘with‘,‘‘}
for i in ‘,.?!\n"‘:
    A=A.replace(i,‘ ‘)
A=A.lower()
A=A.split(" ")
words=set(A)
dic={}
keys=set(A)#出现过单词的集合,字典的KEY
keys=keys-exc
for i in keys:
    dic[i]=A.count(i)
w=list(dic.items())
w.sort(key=lambda x:x[1],reverse=True)
for i in range(20):
    print(w[i])
fo.close()
技术分享

技术分享

 

文件方式实现完整的英文词频统计实例

标签:think   rom   read   词汇   top   body   排序   tool   alt   

原文地址:http://www.cnblogs.com/lianghaohui123/p/7598950.html

(0)
(0)
   
举报
评论 一句话评论(0
登录后才能评论!
© 2014 mamicode.com 版权所有  联系我们:gaon5@hotmail.com
迷上了代码!