python统计word文档中的词频

时间：2020-03-10 23:29:42 阅读：99 评论：0 收藏：0 [点我收藏+]

标签：list 调用 main return code name continue 文档 lam

如何将统计word文档中的词频呢？先用docx模块将word文档转变成txt格式，然后使用jieba模块进行分词，并统计词频。是不是很简单～

#2020年3月10日
#Elizabeth
from docx import Document
import jieba #分词模块

#自定义函数，将word文档写入txt文档
def to_txt(path):
    document=Document(path)
    txt=open(‘/Users/fangluping/Desktop/数据分析笔试试题/词频统计.txt‘,‘w+‘)
    for paragraph in document.paragraphs:
        text=paragraph.text 
        txt.write(text)
    txt.close()
    return txt

if __name__==‘__main__‘:
    path0=‘/Users/fangluping/Desktop/数据分析笔试试题/笔试题目-V1.0.docx‘
    to_txt(path0) #调用写入txt文档的函数

    #分词
    txt=open(‘/Users/fangluping/Desktop/词频统计.txt‘,‘r‘,encoding=‘utf-8‘).read()
    words=jieba.lcut(txt)
    counts={}
    for word in words:
        if len(word)==1:
            continue
        else:
            counts[word]=counts.get(word,0)+1
    items=list(counts.items())
    items.sort(key=lambda x:x[1],reverse=True)

    for i in range(10):
        word,count=items[i]
        print("{0:<10}{1:>5}".format(word,count))

python统计word文档中的词频

标签：list 调用 main return code name continue 文档 lam

原文地址：https://blog.51cto.com/14534896/2477002

踩

(0)

评论一句话评论（0）

分享档案

更多>

2021年07月29日 (22)
2021年07月28日 (40)
2021年07月27日 (32)
2021年07月26日 (79)
2021年07月23日 (29)
2021年07月22日 (30)
2021年07月21日 (42)
2021年07月20日 (16)
2021年07月19日 (90)
2021年07月16日 (35)

周排行