学习python的第三天

时间：2020-02-03 15:22:32 阅读：64 评论：0 收藏：0 [点我收藏+]

用python实现词频统计时比较简单，但是需要区分是英文文本还是中文文本，两种不同的文本用到的方法稍微有点区别。

对英文文本进行统计：

def getText():
    txt = open("word.txt", "r").read()
    txt = txt.lower()
    for ch in ‘`~!"@$%^&*()_+-=|\:";<>?,"./‘:
        txt = txt.replace(ch, " ")
    return txt


txt = getText()
words = txt.split()
counts = {}
for word in words:
    counts[word] = counts.get(word, 0) + 1
for word in counts:
    print(word + " " + str(counts.get(word)))

对中文文本进行统计：

import jieba

txt = open("word.txt", "r", encoding="utf-8").read()
words = jieba.lcut(txt)
counts = {}
for word in words:
    counts[word] = counts.get(word, 0) + 1
for word in counts:
    print(word + " " + str(counts[word]))

文件操作：

（1）文件的打开：

rt = open("word.txt", "r")

‘r‘   只读模式，默认值，如果文件不存在，返回FileNotFoundError
‘w‘   覆盖写模式，文件不存在则创建，存在则完全覆盖
‘x‘   创建写模式，文件不存在则创建，存在则返回FileExistsError
‘a‘   追加写模式，文件不存在则创建，存在则在文件最后追加内容
‘b‘   二进制文件模式
‘t‘   文本文件模式，默认值
‘+‘   与r/w/x/a一同使用，在原功能基础上增加同时读写功能

（2）文件的关闭：

rt.close()

学习python的第三天

标签：int pen pre 英文内容学习 read 使用 return

原文地址：https://www.cnblogs.com/SwiftAC/p/12255550.html

踩

(0)

评论一句话评论（0）

分享档案

更多>

2021年07月29日 (22)
2021年07月28日 (40)
2021年07月27日 (32)
2021年07月26日 (79)
2021年07月23日 (29)
2021年07月22日 (30)
2021年07月21日 (42)
2021年07月20日 (16)
2021年07月19日 (90)
2021年07月16日 (35)

周排行