Unicode 范围以及python中生成所有Unicode的方法

时间：2017-11-02 16:11:56 阅读：322 评论：0 收藏：0 [点我收藏+]

标签：范围 archive coding 编码 dex nbsp except 统一技术分享

Unicode范围和表示语言

Unicode是一个通用的字符集，包含了65535个字符。计算机在处理特殊字符（除了ASCII表以外的所有字符）时都是把Unicode按照一种编码来保存的。当然了，unicode的统一花了不少人的精力，而且不同编码到今天还有一些不兼容的问题，不过平常的代码中了解一些基础也就够了。

Unicode字符表示语言的范围参考下文：

http://www.cnblogs.com/chenwenbiao/archive/2011/08/17/2142718.html

技术分享

中文（包括日文韩文同用）的范围：

技术分享

Python生成所有Unicode

# -*- coding: utf-8 -*-

def print_unicode(start, end):
    with open(‘unicode_set.txt‘, ‘w‘) as f:
        Start = start
        ct = 0
        while Start <= end:
            try:
                ustr = hex(Start)[2:]
                od = (4 - len(ustr)) * ‘0‘ + ustr # 前补0
                ustr = ‘\u‘ + od
                index = Start - start + 1
                f.write(str(index) + ‘\t‘ + ‘0x‘ + od + ‘\t‘ + ustr.decode(‘unicode-escape‘).encode(‘utf-8‘, ‘ignore‘))
                f.write(‘\r\n‘)
                Start = Start + 1
            except Exception, e:
                print e
                Start += 1
                print Start
# print_unicode(0x4e00, 0x9fbf)
print_unicode(0x0, 0x9fbf)