码迷,mamicode.com
首页 > 其他好文 > 详细

课堂练习(词频统计)

时间:2017-09-25 16:05:39      阅读:154      评论:0      收藏:0      [点我收藏+]

标签:prefix   second   小说   odi   efault   read   star   blog   lis   

希望曾老师讲的内容

没有什么意见,希望可以讲一下大数据的就业前景,就业的薪资待遇。

小说词频统计


import jieba book = "F:\最强升级系统.txt" txt = open(book,"r",encoding=‘GBK‘).read() ex = {‘神仙‘,‘系统‘,‘狂暴‘,‘玩家‘,‘提示‘,‘龙飞‘} ls = [] words = jieba.lcut(txt) counts = {} for word in words: ls.append(word) if len(word) == 1: continue else: counts[word] = counts.get(word,0)+1 for word in ex: del(counts[word]) items = list(counts.items()) items.sort(key = lambda x:x[1], reverse = True) for i in range(10): word , count = items[i] print ("{:<10}{:>5}".format(word,count)) lk = open(‘lk.txt‘,‘w+‘) lk.write(str(ls)) import matplotlib.pyplot as plt from wordcloud import WordCloud wzhz = WordCloud().generate(txt) plt.imshow(wzhz) plt.show()
================ RESTART: C:/Users/Administrator/Desktop/1.py ================
Building prefix dict from the default dictionary ...
Dumping model to file cache C:\Users\ADMINI~1\AppData\Local\Temp\jieba.cache
Loading model cost 0.814 seconds.
Prefix dict has been built succesfully.
没有           41
乔乔           31
恭喜           26
战士           23
李三           22
修炼           21
一个           20
废物           18
蛤蟆功          18
妖兽           17

  

  

课堂练习(词频统计)

标签:prefix   second   小说   odi   efault   read   star   blog   lis   

原文地址:http://www.cnblogs.com/55lsk/p/7591974.html

(0)
(0)
   
举报
评论 一句话评论(0
登录后才能评论!
© 2014 mamicode.com 版权所有  联系我们:gaon5@hotmail.com
迷上了代码!