码迷,mamicode.com
首页 > 其他好文 > 详细

中文词频统计及词云制作

时间:2017-09-25 22:00:49      阅读:150      评论:0      收藏:0      [点我收藏+]

标签:lis   oom   make   utf-8   nes   lease   break   制作   roo   

1、下载一中文长篇小说,并转换成UTF-8编码
fo=open(test.txt,w)
fo.write(‘‘‘spend all your time waiting for that second chance
for the break that will make it ok
there‘s always some reason to feel not good enough
and it‘s hard at the end of the day
i need some distraction or a beautiful release
memories seep from my veins
let me be empty or weightless and maybe
l‘ll find some peace tonight
in the arms of the angel far away from here
from this dark cold hotel room and the endlessness that you feel
you are pulled from the wreckage of your silent reverie
you are in the arms of the angel, may you find some comfort here‘‘‘)
fo.close()
fo=open(test.txt,r)
news=fo.read()
news=news.lower()
for i in .,":
    news=news.replace(i, )
word=news.split( )
dic={}
exp={‘‘,the,and,to,on,of,s,a,me,is}
keys=set(word)-exp
‘‘‘print(keys)‘‘‘

for i in keys:
    dic[i]=word.count(i)
‘‘‘print(dic)‘‘‘

a=list(dic.items())
a.sort(key=lambda x:x[1],reverse=True)
‘‘‘print(a)‘‘‘

for i in range(10):
    print(a[i])
fo.close()

 

2、使用jieba库,进行中文词频统计,输出TOP20的词及出现次数。

import jieba
txt=open(jianai.txt,r,encoding=utf-8)
jianai=txt.read()
for i in ,.""!?:
    jianai=jianai.replace(i, )
jianai=list(jieba.cut(jianai))
ll={,,,,,,离开,认为,这儿,即使,这样,等等}
dic={}
keys=set(jianai)-ll
for i in keys:
    dic[i]=jianai.count(i)
items=list(dic.items())
item.sort(keys=lambda x:x[1],reverse=True)
for i in range(10):
    print(item[i])
jianai.close()

 

中文词频统计及词云制作

标签:lis   oom   make   utf-8   nes   lease   break   制作   roo   

原文地址:http://www.cnblogs.com/liulingyuan/p/7590848.html

(0)
(0)
   
举报
评论 一句话评论(0
登录后才能评论!
© 2014 mamicode.com 版权所有  联系我们:gaon5@hotmail.com
迷上了代码!