【Python】文本包jieba使用

时间：2021-06-06 18:55:44 阅读：0 评论：0 收藏：0 [点我收藏+]

看了一个教程：https://www.cnblogs.com/wkfvawl/p/9487165.html
有些不懂的地方自己查阅了一下

键值的添加，获得文件中相同字符出现的次数， counts = {}，counts.get(word,0)
一个常用的统计词频的方法

txt = "a b c d a b c a b a e"
words = txt.split() #将每个字符按空格分开
print(words) #输出示例
counts = {}  #新建一个字典
for word in words:
    counts[word] = counts.get(word,0) + 1 #能够获得键对应的值
    print(list(counts.items()))

Python Dictionary items()方法
字典的一个函数，以列表返回可遍历的（键，值）元组元素

s = "双儿 洪七公 赵敏 赵敏 逍遥子 鳌拜 殷天正 金轮法王 乔峰" 
ls=s.split()
counts={}
for word in ls:
    counts[word]=counts.get(word,0)+1
item=counts.items()
print("字典类型：",counts)
print("元组类型：",item)

出来的结果不是list类型，需要强制转化为list类型。

元素排序：items.sort[key=lambda x:x[1],reverse=True)
key=lambda x: x[1]lambda是一个隐函数，在这里可以不用管它，记得有这个就可以
后面的x: x[1] 为对前面的对象中的第二维数据（即value）的值进行排序。
格式化输出print("{0:<5}{1:>5}".format(word, count))
print ("{0:<10}{1:>5}".format(word, count))
这个是format方法的格式控制。

{<参数序号>：<填充符号><对齐><宽度><，>< . 精度 ><类型>}
< ： 左对齐
> ：右对齐
^ : 居中对
< , > 表示数字的千位分隔符
整数类型：b, c, d, o, x, X
浮点数类型：e, E, f, %

>>>"{} {}".format("hello", "world")    # 不设置指定位置，按默认顺序
‘hello world‘
 
>>> "{0} {1}".format("hello", "world")  # 设置指定位置
‘hello world‘
 
>>> "{1} {0} {1}".format("hello", "world")  # 设置指定位置
‘world hello world‘

import jieba
txt = open("三国演义.txt", "r", encoding=‘utf-8‘).read()
words = jieba.lcut(txt)
counts = {} #通过键值对的形式存储词语及其出现次数

for word in words:
    if len(word) == 1:
        continue
    else:
        counts[word] = counts.get(word, 0) + 1 #有word时返回其值，+1能够累计次数
#参考https://blog.csdn.net/weixin_42800007/article/details/82024108 

items = list(counts.items())
print(type(items[5]))
items.sort(key = lambda x: x[1], reverse = True)
#key = lambda隐函数；x:x[1]按第二维数据排序（value）

for i in range(15):
    word, count = items[i]
    print("{0:<5}{1:>5}".format(word, count))

【Python】文本包jieba使用

标签：分隔符排序 blank 序号字典对齐 tail 类型 split()

原文地址：https://www.cnblogs.com/kinologic/p/14853799.html

踩

(0)

评论一句话评论（0）

分享档案

更多>

2021年07月29日 (22)
2021年07月28日 (40)
2021年07月27日 (32)
2021年07月26日 (79)
2021年07月23日 (29)
2021年07月22日 (30)
2021年07月21日 (42)
2021年07月20日 (16)
2021年07月19日 (90)
2021年07月16日 (35)

周排行