码迷,mamicode.com
首页 > 其他好文 > 详细

Unicode explorer

时间:2016-06-06 09:05:23      阅读:194      评论:0      收藏:0      [点我收藏+]

标签:


It can be cumbersome to work out some of the details of this by hand, so you can use the little Javascript-based tool below to display useful information about any string you can enter into the text field. Currently I don‘t have any support for going the other way (e.g. from UTF-16 code units to text) but hopefully this is still useful.

Enter text here:

CharacterUnicodeUTF-16UTF-8

This table breaks down the text in the text-box into Unicode characters. It does not perform any kind of normalization, so an accented character may appear as one character or more, depending on whether it is entered as a single character including the accent (e.g. é), or a non-accented character followed by combining characters (e.g. e? - yes, that really is different to the previous example; copy and paste them both to see!). However, it does break the input into Unicode characters instead of just UTF-16 code units; a surrogate pair is treated as a single character. For example, ?? (which apparently isn‘t a valid Unicode character, but appears to have a commonly understood meaning and glyph) is shown as U+20B20.

The first column simply displays the character. The second column displays the Unicode code point (U+0000 to U+10FFFF), suitable for looking up in Unicode code charts. The third column displays the UTF-16 code units which make up the character: these are the char values which would appear in a C# (or Java, or Javascript) script. For characters in the Basic Multilingual Plane this will just be a single code unit; for other characters it will be the surrogate pair (high then low). The fourth column displays the UTF-8 representation of the character in bytes.

 

参考:http://csharpindepth.com/Articles/General/Unicode.aspx

Unicode explorer

标签:

原文地址:http://www.cnblogs.com/wangqiideal/p/5562677.html

(0)
(0)
   
举报
评论 一句话评论(0
登录后才能评论!
© 2014 mamicode.com 版权所有  联系我们:gaon5@hotmail.com
迷上了代码!