【Python】三国演义词频统计

时间：2018-05-03 15:24:07 阅读：694 评论：0 收藏：0 [点我收藏+]

标签：des txt cut exclude 文件编码 int coding == use

import jieba
txt = open(‘C:/Users/eternal/Desktop/threekingdoms.txt‘,‘r‘,encoding=‘UTF-8‘).read()　　#提前修改txt文件编码格式utf-8
excludes = {‘将军‘,‘却说‘,‘荆州‘,‘二人‘,‘不可‘,‘不能‘,‘如此‘}　　#错误的名字
words = jieba.lcut(txt)
print(words)
counts = {}
for word in words:
    if len(word) == 1:
        continue
    elif word == ‘诸葛亮‘ or word == ‘孔明曰‘:
        rword = ‘孔明‘
    elif word == ‘关公‘ or word == ‘云长‘:
        rword == ‘关羽‘
    elif word == ‘玄德‘ or word == ‘玄德曰‘:
        rword = ‘刘备‘
    elif word == ‘孟德‘ or word == ‘丞相‘:
        rword = ‘曹操‘
    else:
        rword = word
    counts[rword] = counts.get(rword,0) + 1
for word in excludes:
    del counts[word]
items = list(counts.items())
items.sort(key=lambda x:x[1],reverse=True)
print(items)
for i in range(10):
    word,count = items[i]
    print(‘{0:<10}{1:>5}‘.format(word,count))

【Python】三国演义词频统计

标签：des txt cut exclude 文件编码 int coding == use

原文地址：https://www.cnblogs.com/naraka/p/8985134.html

踩

(0)

(1)

评论一句话评论（0）

分享档案

更多>

2021年07月29日 (22)
2021年07月28日 (40)
2021年07月27日 (32)
2021年07月26日 (79)
2021年07月23日 (29)
2021年07月22日 (30)
2021年07月21日 (42)
2021年07月20日 (16)
2021年07月19日 (90)
2021年07月16日 (35)

周排行