码迷,mamicode.com
首页 > 其他好文 > 详细

综合练习:英文词频统计

时间:2018-03-25 21:46:06      阅读:246      评论:0      收藏:0      [点我收藏+]

标签:ref   round   ted   div   大写   only   rom   nbsp   split()   

  1. 词频统计预处理
  2. 下载一首英文的歌词或文章
  3. 将所有,.?!’:等分隔符全部替换为空格
  4. 将所有大写转换为小写
  5. 生成单词列表
  6. 生成词频统计
  7. 排序
  8. 排除语法型词汇,代词、冠词、连词
  9. 输出词频最大TOP10
song = ‘‘‘
If you say you’re the firework at the bay

I wish I could be a wave

after the rain, you light up the gray

far away you’re the galaxy from space

with the stars you kiss my face

I’ll go everywhere after your trace

when I’m lonely l willearntoembrace

I’ll follow you along the way

like shadow chasing down the flame

I’ll wait for you right on your way

come and stay with me if you may

I’ll raise my head and look your way

tears dropping down and feeling free

Some love comes by like hurricane

as if I play your losing game

If you’re like firefly in summer haze

Children laugh around your grace

Then I’ll be there, trying to say out your name

Look at me, what a tiny helpless me

Only dream when you smile at me

Maybe you wouldn’t stop just for me

Far behind let me stand there singing

I’ll follow you along the way

like shadow chasing down the flame

I’ll wait for you right on your way

come and stay with me if you may

I’ll raise my head and look your way

tears dropping down and feeling free

Some love comes by like hurricane

but rainbows rise

I’ll follow you along the way

like shadow chasing down the flame

I’ll wait for you right on your way

come and stay with me if you may

I’ll raise my head and look your way

tears dropping down and feeling free

Some love comes by like hurricane

but rainbows rise after the pain
‘‘‘

#将所有分隔符全部替换为空格,将所有大写转换为小写,以空格划分每个单词 s1 = song.replace(, ).lower().split() s2 = song.split() #统计各单词出现的次数 c = {} for i in s2: count = s1.count(i) c[i] = count #去掉没意义的单词 word = ‘‘‘ i you you‘re the by up a but my and would when some i‘ll i‘m with on could come from Maybe only out me in at for if your down ‘‘‘ s3 = word.split() for i in s3: if i in c.keys(): del (c[i])
#按每个单词出现的次数进行排序 count = sorted(c.items(),key=lambda items: items[1], reverse=True) #输出词频最大TOP10 for i in range(10): print(count[i])

 

综合练习:英文词频统计

标签:ref   round   ted   div   大写   only   rom   nbsp   split()   

原文地址:https://www.cnblogs.com/wumeiying/p/8647006.html

(0)
(0)
   
举报
评论 一句话评论(0
登录后才能评论!
© 2014 mamicode.com 版权所有  联系我们:gaon5@hotmail.com
迷上了代码!