关于 Python3 的编码

时间：2017-08-07 17:38:34 阅读：208 评论：0 收藏：0 [点我收藏+]

1、Python3 中 str 与 bytes 的转换：The bytes/str dichotomy in Python 3

2、关于utf8的bom头。（Python3下）

>>> import codecs
>>> codecs.BOM_UTF8
b‘\xef\xbb\xbf‘
>>> len(b‘\xef\xbb\xbf‘)
3
>>> codecs.BOM_UTF8.decode(‘utf8‘)
‘\ufeff‘
>>> len(‘\ufeff‘)
1

3、Python3 有哪些编码：Standard Encodings、Python Specific Encodings 。

4、打印编码及别名。（Get a list of all the encodings Python can encode to）

>>> from encodings.aliases import aliases
>>> for k in aliases:
	print(‘%s: %s‘ % (k, aliases[k]))

5、验证是不是有效编码。

>>> import codecs

>>> codecs.lookup(‘utf8‘)    #有效
<codecs.CodecInfo object for encoding utf-8 at 0x13fb4f50828>

>>> codecs.lookup(‘utf-;8‘)    #有效
<codecs.CodecInfo object for encoding utf-8 at 0x13fb4f50a08>

>>> codecs.lookup(‘utf88‘)    #无效
Traceback (most recent call last):
  File "<pyshell#4>", line 1, in <module>
    codecs.lookup(‘utf88‘)
LookupError: unknown encoding: utf88

6、标准化 encoding。

>>> import encodings
>>> encodings.normalize_encoding(‘utf-;8‘)
‘utf_8‘

对应 C 代码为：unicodeobject.c 中的 _Py_normalize_encoding 函数。

7、sys/locale 模块中与编码相关的方法。（Python字符编码详解）

import sys
import locale
 
# 当前系统所使用的默认字符编码
>>> sys.getdefaultencoding()
‘utf-8‘
 
# 用于转换 Unicode 文件名至系统文件名所使用的编码
>>> sys.getfilesystemencoding()
‘utf-8‘
 
# 获取默认的区域设置并返回元组(语言, 编码)
>>> locale.getdefaultlocale()
(‘zh_CN‘, ‘cp936‘)
 
# 返回用户设定的文本数据编码
# 文档提到this function only returns a guess
>>> locale.getpreferredencoding()
‘cp936‘

相关阅读：Unicode Tips

*** walker ***

本文出自 “walker的流水账” 博客，请务必保留此出处http://walkerqt.blog.51cto.com/1310630/1954215

关于 Python3 的编码

标签：gbk decode utf8 encode

原文地址：http://walkerqt.blog.51cto.com/1310630/1954215

踩

(0)

评论一句话评论（0）

分享档案

更多>

2021年07月29日 (22)
2021年07月28日 (40)
2021年07月27日 (32)
2021年07月26日 (79)
2021年07月23日 (29)
2021年07月22日 (30)
2021年07月21日 (42)
2021年07月20日 (16)
2021年07月19日 (90)
2021年07月16日 (35)

周排行