码迷,mamicode.com
首页 > 其他好文 > 详细

词频统计

时间:2018-03-27 18:47:22      阅读:135      评论:0      收藏:0      [点我收藏+]

标签:print   new   inf   ide   little   ems   log   use   read   

下载一首英文的歌词或文章

将所有,.?!’:等分隔符全部替换为空格

将所有大写转换为小写

生成单词列表

生成词频统计

排序

排除语法型词汇,代词、冠词、连词

输出词频最大TOP20

将分析对象存为utf-8编码的文件,通过文件读取的方式获得词频分析内容。

f=open(‘new.txt‘,‘r‘)
str0 = f.read();
f.close();
str1 = ‘‘‘,.‘?!"‘‘‘;
for i in str1:
    list1 = str0.replace(i,‘ ‘);
list1 = str0.lower().split();

gath2={‘in‘,‘to‘,‘your‘,‘you‘,‘and‘,‘the‘,‘for‘};

gath=set(list1)-gath2;
print(gath)
#字典
dict={}
for w in gath:
    dict[w]=list1.count(w)

list1 = list(dict.items())
list1.sort(key=lambda x:x[1],reverse=True)
print(list1)

f=open(‘newscount.txt‘,‘a‘)
for i in range(25):
    f.write(list1[i][0]+‘ ‘+str(list1[i][1])+‘\n‘)
f.close()

运行结果:

[(‘be‘, 8), (‘one‘, 8), (‘a‘, 8), (‘wanna‘, 6), (‘not‘, 6), (‘or‘, 5), (‘can‘, 5), (‘just‘, 5), (‘that‘, 4), (‘president‘, 3), (‘know‘, 3), (‘actor‘, 3), (‘we‘, 3), (‘lawyer‘, 3), (‘all‘, 3), (‘singerwhy‘, 3), (‘dreameryou‘, 3), (‘could‘, 2), (‘manyou‘, 2), (‘bepolice‘, 2), (‘manwhy‘, 2), (‘old‘, 2), (‘fighter‘, 2), (‘like‘, 2), (‘this‘, 2), (‘got‘, 2), (‘something‘, 2), (‘what‘, 2), (‘really‘, 2), (‘man‘, 2), (‘life‘, 2), (‘post‘, 2), (‘fire‘, 2), (‘on‘, 2), (‘foryou‘, 1), (‘it‘, 1), (‘matterwe‘, 1), (‘matterluxury‘, 1), (‘live‘, 1), (‘real‘, 1), (‘cars‘, 1), (‘caught‘, 1), (‘ways‘, 1), (‘reach‘, 1), (‘lifefocus‘, 1), (‘team‘, 1), (‘nice‘, 1), (‘does‘, 1), ("that‘s", 1), ("thingthat‘s", 1), (‘of‘, 1), (‘little‘, 1), (‘play‘, 1), ("doesn‘t", 1), (‘medoctor‘, 1), (‘topmake‘, 1), (‘dream‘, 1), (‘fori‘, 1), (‘may‘, 1), ("bring‘cause", 1), (‘foreverjust‘, 1), (‘nothing‘, 1), (‘every‘, 1), (‘steam‘, 1), (‘lasts‘, 1), (‘bedoctor‘, 1), (‘up‘, 1), (‘bei‘, 1), (‘hold‘, 1), (‘bewe‘, 1), (‘doctor‘, 1), (‘never‘, 1), (‘thingthat‘, 1), (‘different‘, 1), (‘have‘, 1), (‘stopbe‘, 1), (‘sure‘, 1), (‘benow‘, 1), (‘share‘, 1), (‘thinkbut‘, 1), (‘with‘, 1), (‘bling‘, 1), ("won‘t", 1), (‘sing‘, 1), (‘togetherwe‘, 1)]

技术分享图片

词频统计

标签:print   new   inf   ide   little   ems   log   use   read   

原文地址:https://www.cnblogs.com/lgy520/p/8658631.html

(0)
(0)
   
举报
评论 一句话评论(0
登录后才能评论!
© 2014 mamicode.com 版权所有  联系我们:gaon5@hotmail.com
迷上了代码!