码迷,mamicode.com
首页 > 其他好文 > 详细

综合练习-词频统计

时间:2018-03-28 23:53:33      阅读:207      评论:0      收藏:0      [点我收藏+]

标签:key   blog   txt   while   hat   life   ref   方式   follow   

1.英文词频统

下载一首英文的歌词或文章

news=‘‘‘

Passion is sweet
Love makes weak
You said you cherised freedom so
You refused to let it go
Follow your faith
Love and hate
never failed to seize the day
Don‘t give yourself away
Oh when the night falls
And your all alone
In your deepest sleep
What are you dreeeming of
My skin‘s still burning from your touch
Oh I just can‘t get enough
I said I wouldn‘t ask for much
But your eyes are dangerous
So the tought keeps spinning in my head
Can we drop this masquerade
I can‘t predict where it ends
If you‘re the rock I‘ll crush against
Trapped in a crowd
Music‘s loud
I said I loved my freedom too
Now im not so sure i do
All eyes on you
Wings so true
Better quit while your ahead
Now im not so sure i am
Oh when the night falls
And your all alone
In your deepest sleep
What are you dreaming of
My skin‘s still burning from your touch
Oh I just can‘t get enough
I said I wouldn‘t ask for much
But your eyes are dangerous
So the thought keeps spinning in my head
Can we drop this masquerade
I can‘t predict where it ends
If you‘re the rock I‘ll crush against
My soul, my heart
If your near or if your far
My life, my love
You can have it all
Oh when the night falls
And your all alone
In your deepest sleep
What are you dreaming of
My skin‘s still burning from your touch
Oh I just can‘t get enough
I said I wouldn‘t ask for much
But your eyes are dangerous
So the thought keeps spinning in my head
Can we drop this masquerade
I can‘t predict where it ends
If you‘re the rock I‘ll crush against
If you‘re the rock i‘ll crush against
‘‘‘
 
将所有,.?!’:等分隔符全部替换为空格
 
sep = ‘‘‘:.,?!‘‘‘
for i in sep:
    news = news.replace(i,‘ ‘);

  将所有大写转换为小写

news = news.lower();

  生成单词列表

news_list = news.split();
print(news_list);

  生成词频统计

news_dict={}
news_set =set(news_list)-exclude
for w in news_set:
    news_dict[w] = news_list.count(w)
for w in news_dict:
    print(w,news_dict[w])
   
   
news_dict={}
for w in news_list:
    news_dict[w] =news_dict.get(w,0)+1
for w in exclude:
    del (news_dict[w]);
for w in news_dict:
    print(w,news_dict[w])

  排序

dictList = list(news_dict.items())
dictList.sort(key=lambda x:x[1],reverse=True);

  排除语法型词汇,代词、冠词、连词

exclude = {‘the‘,‘to‘,‘is‘,‘and‘}
for w in exclude:
    del (news_dict[w]);

  输出词频最大TOP20

for i in range(20):
     print(dictList[i])

  将分析对象存为utf-8编码的文件,通过文件读取的方式获得词频分析内容

file =  open("test.txt", "r",encoding=‘utf-8‘);
news = file.read();
file.close(

  

综合练习-词频统计

标签:key   blog   txt   while   hat   life   ref   方式   follow   

原文地址:https://www.cnblogs.com/xjh602545141/p/8666633.html

(0)
(0)
   
举报
评论 一句话评论(0
登录后才能评论!
© 2014 mamicode.com 版权所有  联系我们:gaon5@hotmail.com
迷上了代码!