Python 字符编码

时间：2017-08-22 00:34:07 阅读：199 评论：0 收藏：0 [点我收藏+]

标签：reader 编码 pen tool out name 标准库 error 汉字

采用标准库codecs模块

codecs.open(filename, mode=‘r‘, encoding=None, errors=‘strict‘, buffering=1)

1 import codecs
2 f = codecs.open(filename, encoding=‘utf-8‘)

使用上边这种方式读进来utf-8文件，会自动转换为unicode。但必须明确该文件类型为utf8类型。如果是文件中有汉字，不是一个字节一个字节地读而是整个汉字的所有字节读进来然后转换成unicode（猜想跟汉字的utf8编码有关）。

下边的代码也是一种使用codecs的读写方式

#coding=utf-8
import codecs

fin = open("test.txt", ‘r‘)
fout = open("utf8.txt", ‘w‘)

reader = codecs.getreader(‘gbk‘)(fin)
writer = codecs.getwriter(‘gbk‘)(fout)

data = reader.read(10)
#10是最大字节数，默认值为-1表示尽可能大。可以避免一次处理大量数据
while data:
    writer.write(data)
    data = reader.read(10)

Python 字符编码

标签：reader 编码 pen tool out name 标准库 error 汉字

原文地址：http://www.cnblogs.com/p0yz/p/7407244.html

踩

(0)

评论一句话评论（0）

分享档案

更多>

2021年07月29日 (22)
2021年07月28日 (40)
2021年07月27日 (32)
2021年07月26日 (79)
2021年07月23日 (29)
2021年07月22日 (30)
2021年07月21日 (42)
2021年07月20日 (16)
2021年07月19日 (90)
2021年07月16日 (35)

周排行