Python内置了很多编码的字符集处理,有些是使用C语言实现,有些是使用字典映射方式实现。下表按名称排序的字符集表,有些名称是可以别的名称的,比如utf-8也可以使用名称utf_8来表查找。CPython实现与其它实现有一些差别,针对一些编码字符集作了优化,如果使用这些字符集之外的字符集可能速度比较慢。优化的字符集:utf-8, utf8, latin-1, latin1, iso-8859-1, mbcs (Windows only), ascii, utf-16, and utf-32。有些字符集支持不同的语言,也有一些独立的字符集。
Codec | Aliases | Languages |
ascii | 646, us-ascii | English |
big5 | big5-tw, csbig5 | Traditional Chinese |
big5hkscs | big5-hkscs, hkscs | Traditional Chinese |
cp037 | IBM037, IBM039 | English |
cp273 | 273, IBM273, csIBM273 | German New in version 3.4. |
cp424 | EBCDIC-CP-HE, IBM424 | Hebrew |
cp437 | 437, IBM437 | English |
cp500 | EBCDIC-CP-BE, EBCDIC-CP-CH, IBM500 | Western Europe |
cp720 |
| Arabic |
cp737 |
| Greek |
cp775 | IBM775 | Baltic languages |
cp850 | 850, IBM850 | Western Europe |
cp852 | 852, IBM852 | Central and Eastern Europe |
cp855 | 855, IBM855 | Bulgarian, Byelorussian, Macedonian, Russian, Serbian |
cp856 |
| Hebrew |
cp857 | 857, IBM857 | Turkish |
cp858 | 858, IBM858 | Western Europe |
cp860 | 860, IBM860 | Portuguese |
cp861 | 861, CP-IS, IBM861 | Icelandic |
cp862 | 862, IBM862 | Hebrew |
cp863 | 863, IBM863 | Canadian |
cp864 | IBM864 | Arabic |
cp865 | 865, IBM865 | Danish, Norwegian |
cp866 | 866, IBM866 | Russian |
cp869 | 869, CP-GR, IBM869 | Greek |
cp874 |
| Thai |
cp875 |
| Greek |
cp932 | 932, ms932, mskanji, ms-kanji | Japanese |
cp949 | 949, ms949, uhc | Korean |
cp950 | 950, ms950 | Traditional Chinese |
cp1006 |
| Urdu |
cp1026 | ibm1026 | Turkish |
cp1125 | 1125, ibm1125, cp866u, ruscii | Ukrainian New in version 3.4. |
cp1140 | ibm1140 | Western Europe |
cp1250 | windows-1250 | Central and Eastern Europe |
cp1251 | windows-1251 | Bulgarian, Byelorussian, Macedonian, Russian, Serbian |
cp1252 | windows-1252 | Western Europe |
cp1253 | windows-1253 | Greek |
cp1254 | windows-1254 | Turkish |
cp1255 | windows-1255 | Hebrew |
cp1256 | windows-1256 | Arabic |
cp1257 | windows-1257 | Baltic languages |
cp1258 | windows-1258 | Vietnamese |
cp65001 |
| Windows only: Windows UTF-8 (CP_UTF8) New in version 3.3. |
euc_jp | eucjp, ujis, u-jis | Japanese |
euc_jis_2004 | jisx0213, eucjis2004 | Japanese |
euc_jisx0213 | eucjisx0213 | Japanese |
euc_kr | euckr, korean, ksc5601, ks_c-5601, ks_c-5601-1987, ksx1001, ks_x-1001 | Korean |
gb2312 | chinese, csiso58gb231280, euc- cn, euccn, eucgb2312-cn, gb2312-1980, gb2312-80, iso- ir-58 | Simplified Chinese |
gbk | 936, cp936, ms936 | Unified Chinese |
gb18030 | gb18030-2000 | Unified Chinese |
hz | hzgb, hz-gb, hz-gb-2312 | Simplified Chinese |
iso2022_jp | csiso2022jp, iso2022jp, iso-2022-jp | Japanese |
iso2022_jp_1 | iso2022jp-1, iso-2022-jp-1 | Japanese |
iso2022_jp_2 | iso2022jp-2, iso-2022-jp-2 | Japanese, Korean, Simplified Chinese, Western Europe, Greek |
iso2022_jp_2004 | iso2022jp-2004, iso-2022-jp-2004 | Japanese |
iso2022_jp_3 | iso2022jp-3, iso-2022-jp-3 | Japanese |
iso2022_jp_ext | iso2022jp-ext, iso-2022-jp-ext | Japanese |
iso2022_kr | csiso2022kr, iso2022kr, iso-2022-kr | Korean |
latin_1 | iso-8859-1, iso8859-1, 8859, cp819, latin, latin1, L1 | West Europe |
iso8859_2 | iso-8859-2, latin2, L2 | Central and Eastern Europe |
iso8859_3 | iso-8859-3, latin3, L3 | Esperanto, Maltese |
iso8859_4 | iso-8859-4, latin4, L4 | Baltic languages |
iso8859_5 | iso-8859-5, cyrillic | Bulgarian, Byelorussian, Macedonian, Russian, Serbian |
iso8859_6 | iso-8859-6, arabic | Arabic |
iso8859_7 | iso-8859-7, greek, greek8 | Greek |
iso8859_8 | iso-8859-8, hebrew | Hebrew |
iso8859_9 | iso-8859-9, latin5, L5 | Turkish |
iso8859_10 | iso-8859-10, latin6, L6 | Nordic languages |
iso8859_13 | iso-8859-13, latin7, L7 | Baltic languages |
iso8859_14 | iso-8859-14, latin8, L8 | Celtic languages |
iso8859_15 | iso-8859-15, latin9, L9 | Western Europe |
iso8859_16 | iso-8859-16, latin10, L10 | South-Eastern Europe |
johab | cp1361, ms1361 | Korean |
koi8_r |
| Russian |
koi8_u |
| Ukrainian |
mac_cyrillic | maccyrillic | Bulgarian, Byelorussian, Macedonian, Russian, Serbian |
mac_greek | macgreek | Greek |
mac_iceland | maciceland | Icelandic |
mac_latin2 | maclatin2, maccentraleurope | Central and Eastern Europe |
mac_roman | macroman, macintosh | Western Europe |
mac_turkish | macturkish | Turkish |
ptcp154 | csptcp154, pt154, cp154, cyrillic-asian | Kazakh |
shift_jis | csshiftjis, shiftjis, sjis, s_jis | Japanese |
shift_jis_2004 | shiftjis2004, sjis_2004, sjis2004 | Japanese |
shift_jisx0213 | shiftjisx0213, sjisx0213, s_jisx0213 | Japanese |
utf_32 | U32, utf32 | all languages |
utf_32_be | UTF-32BE | all languages |
utf_32_le | UTF-32LE | all languages |
utf_16 | U16, utf16 | all languages |
utf_16_be | UTF-16BE | all languages |
utf_16_le | UTF-16LE | all languages |
utf_7 | U7, unicode-1-1-utf-7 | all languages |
utf_8 | U8, UTF, utf8 | all languages |
utf_8_sig |
| all languages |
蔡军生 QQ:9073204 深圳
版权声明:本文为博主原创文章,未经博主允许不得转载。
原文地址:http://blog.csdn.net/caimouse/article/details/49716495