python 利用jieba库词频统计

时间：2018-07-12 23:47:12 阅读：497 评论：0 收藏：0 [点我收藏+]

 1 #统计《三国志》里人物的出现次数
 2 
 3 import jieba
 4 text = open(‘threekingdoms.txt‘,‘r‘,encoding=‘utf-8‘).read()
 5 excludes = {‘将军‘,‘却说‘,‘二人‘,‘不能‘,‘如此‘,‘荆州‘,‘不可‘,‘商议‘,‘如何‘,‘军士‘,‘左右‘,‘主公‘,‘引兵‘,‘次日‘,‘大喜‘,‘军马‘,
 6 ‘天下‘,‘东吴‘,‘于是‘}
 7 #返回列表类型的分词结果
 8 words = jieba.lcut(text)
 9 #通过字典映射，统计次数
10 counts = {}
11 for word in words:
12     if len(word) == 1:
13         continue
14     elif word == ‘孔明曰‘ or word == ‘孔明‘:
15         rword = ‘诸葛亮‘
16     elif word == ‘关公‘ or word == ‘云长‘:
17         rword = ‘关羽‘
18     elif word == ‘玄德‘ or word == ‘玄德曰‘:
19         rword = ‘刘备‘
20     elif word == ‘孟德‘ or word == ‘丞相‘:
21         rword = ‘曹操‘
22     else:
23         rword = word
24     counts[rword] = counts.get(rword,0) + 1
25 for word in excludes:
26     del counts[word]
27 items = list(counts.items())
28 #排序，从大到小
29 items.sort(key=lambda x:x[1],reverse=True)
30 for i in range(5):
31     word,count = items[i]
32     print(‘{0:<10}{1:>5}‘.format(word,count))

python 利用jieba库词频统计

标签：天下排序 == int get list print read 次数

原文地址：https://www.cnblogs.com/sineik/p/9302218.html

踩

(0)

评论一句话评论（0）

分享档案

更多>

2021年07月29日 (22)
2021年07月28日 (40)
2021年07月27日 (32)
2021年07月26日 (79)
2021年07月23日 (29)
2021年07月22日 (30)
2021年07月21日 (42)
2021年07月20日 (16)
2021年07月19日 (90)
2021年07月16日 (35)

周排行