标签:
字符编码为计算机文字的存储格式, 例如 英文 字母 以ASCII编码存储, 即单字节存储, 其他字符编码有 UTF-8(通用字符编码格式), 其他区域性编码格式, 例如 ISO-8859(西欧), windows-1251俄文,中文GB编码。
正因各个地区有不同的编码格式, 为了交换信息的目的, 就需要将相同字符的 从一种编码格式 转换为 另外一种编码格式。
通用的编码格式为 UTF-8, 其囊括了 世界上所有字符, 所以一般为了通用性, 文件都以UTF-8编码(例如网页支持多语言显示的情况), 其他编码的语言一般都向UTF-8转换。
http://www.gnu.org/software/libiconv/#introduction
GNU世界提供了 一个开源 转换库, 支持若干编码 和 unicode 编码之间的转换。 此库可以再没有提供编码转换的系统上使用。
项目地址 http://savannah.gnu.org/projects/libiconv/
最新的Linux C库以已经提供 iconv 的转换,可以不用安装:
http://davidgao.github.io/LFSCN/chapter06/glibc.html
LFS 之外的某些程序包推荐安装 GNU libiconv 用于转换文本编码。此工程的主页 (http://www.gnu.org/software/libiconv/) 表示 “此库提供一个
iconv()
实现,用于没有提供此实现或无法操作 Unicode 的系统。” Glibc 提供一个iconv()
实现并且可以操作 Unicode,所以在 LFS 系统上不必安装 libiconv。
对于成熟的 lua, 对iconv功能进行了封装, 形成了一个专门的库,提供给LUA应用脚本使用。
官网介绍
http://ittner.github.io/lua-iconv/#download-and-installation
local iconv = require("iconv")cd = iconv.new(to, from) cd = iconv.open(to, from)nstr, err = cd:iconv(str) Converts the ‘str‘ string to the desired charset. This method always returns two arguments: the converted string and an error code, which may have any of the following values: nil No error. Conversion was successful. iconv.ERROR_NO_MEMORY Failed to allocate enough memory in the conversion process. iconv.ERROR_INVALID An invalid character was found in the input sequence. iconv.ERROR_INCOMPLETE An incomplete character was found in the input sequence. iconv.ERROR_FINALIZED Trying to use an already-finalized converter. This usually means that the user was tweaking the garbage collector private methods. iconv.ERROR_UNKNOWN There was an unknown error.
对于LUA 5.1版本, 推荐下载 lua-iconv-5 版本, 最新的-7版本兼容 LUA5.2
https://github.com/ittner/lua-iconv/releases/tag/lua-iconv-5
安装运行有报错:
:~/share_windows/openSource/lua/lua-iconv-lua-iconv-5$ lua test_iconv.lua
lua: error loading module ‘iconv‘ from file ‘./iconv.so‘:
./iconv.so: undefined symbol: libiconv_open
stack traceback:
[C]: ?
[C]: in function ‘require‘
test_iconv.lua:1: in main chunk
[C]: ?
经过查证(受到此文启发 http://tonybai.com/2013/04/25/a-libiconv-linkage-problem/),
分析为先安装了 libiconv库, 导致 此库的iconv.h拷贝到 usr/local/include/iconv.h
然后编译 luaiconv工程,编译文件iconv.c文件时候, gcc先找到 usr/local/include/iconv.h 此文件, 以此文件内部的函数声明为准,编译出iconv.so
实际上次应该以系统提供的 iconv.h 为准, 此文件在 /usr/include/iconv.h
头文件gcc搜索次序:
:~/share_windows/openSource/lua/lua-iconv-lua-iconv-5$ ld -verbose | grep SEARCH
SEARCH_DIR("=/usr/i686-linux-gnu/lib32"); SEARCH_DIR("=/usr/local/lib32"); SEARCH_DIR("=/lib32"); SEARCH_DIR("=/usr/lib32"); SEARCH_DIR("=/usr/i686-linux-gnu/lib"); SEARCH_DIR("=/usr/local/lib/i386-linux-gnu"); SEARCH_DIR("=/usr/local/lib"); SEARCH_DIR("=/lib/i386-linux-gnu"); SEARCH_DIR("=/lib"); SEARCH_DIR("=/usr/lib/i386-linux-gnu"); SEARCH_DIR("=/usr/lib");
libiconv-------usr/local/include/iconv.h
#ifndef LIBICONV_PLUG
#define iconv_open libiconv_open
#endif
extern LIBICONV_DLL_EXPORTED iconv_t iconv_open (const char* tocode, const char* fromcode);
libiconv -- iconv.c 中 libiconv_open 定义收到宏控制, 应该未开启, 或者编译 luaiconv未链接libiconv库
#if defined __FreeBSD__ && !defined __gnu_freebsd__
/* GNU libiconv is the native FreeBSD iconv implementation since 2002.
It wants to define the symbols ‘iconv_open‘, ‘iconv‘, ‘iconv_close‘. */
#define strong_alias(name, aliasname) _strong_alias(name, aliasname)
#define _strong_alias(name, aliasname) \
extern __typeof (name) aliasname __attribute__ ((alias (#name)));
#undef iconv_open
#undef iconv
#undef iconv_close
strong_alias (libiconv_open, iconv_open)
strong_alias (libiconv, iconv)
strong_alias (libiconv_close, iconv_close)
#endif
解决方法: 修改实现文件中, 引用的 iconv.h 引用方式, 将标准方式, 修改为自定义,并且写为全路径 /usr/include/iconv.h
然后再次 make && make install, 运行ok
vim luaiconv.c
#include <lua.h>
#include <lauxlib.h>
#include <stdlib.h>
#include "/usr/include/iconv.h"
#include <errno.h>
安装运行其它报错参考:
https://github.com/ittner/lua-iconv/issues/3
在一些嵌入式系统上, 没有安装libiconv库, 或者 libc库中也没有实现 iconv 功能, 但是同时还是需要字符换场景,
可以在编译服务器上, 安装luaiconv, 利用系统的iconv功能, 生成 一种编码到另外一种编码的映射表, 然后利用此映射表来, 是实现转换。
例如, 将windows-1251转换为UTF-8
windows-1251 字符编码参考:
http://www.science.co.il/language/Character-code.asp?s=1251
生成表的LUA代码:
function serializeTable(val, name, skipnewlines, depth) skipnewlines = skipnewlines or false depth = depth or 0 local tmp = string.rep(" ", depth) if name then tmp = tmp .. name .. " = " end if type(val) == "table" then tmp = tmp .. "{" .. (not skipnewlines and "\n" or "") for k, v in pairs(val) do tmp = tmp .. serializeTable(v, k, skipnewlines, depth + 1) .. "," .. (not skipnewlines and "\n" or "") end tmp = tmp .. string.rep(" ", depth) .. "}" elseif type(val) == "number" then tmp = tmp .. tostring(val) elseif type(val) == "string" then tmp = tmp .. string.format("%q", val) elseif type(val) == "boolean" then tmp = tmp .. (val and "true" or "false") else tmp = tmp .. "\"[inserializeable datatype:" .. type(val) .. "]\"" end return tmp end local iconv = require("iconv") -- Set your terminal encoding here -- local termcs = "iso-8859-1" local termcs = "utf-8" function check_one(to, from, text) print("\n-- Testing conversion from " .. from .. " to " .. to) local cd = iconv.new(to .. "//TRANSLIT", from) assert(cd, "Failed to create a converter object.") local ostr, err = cd:iconv(text) if err == iconv.ERROR_INCOMPLETE then print("ERROR: Incomplete input.") elseif err == iconv.ERROR_INVALID then print("ERROR: Invalid input.") elseif err == iconv.ERROR_NO_MEMORY then print("ERROR: Failed to allocate memory.") elseif err == iconv.ERROR_UNKNOWN then print("ERROR: There was an unknown error.") end print(ostr) return ostr end local result = {} local num = 255 for i = 0, num do print("----------------------------------- i="..i) local char = string.char(i) local ostr = check_one(termcs, "windows-1251", char) print(string.len(ostr)) local byteStr = "" for j = 1, string.len(ostr) do local byteVal = string.byte(ostr,j) print("byte j=" ..j .. " byteVal=".. byteVal) byteStr = byteStr .. "\\" .. byteVal end print("char i=" ..i .. " byteStr=".. byteStr) table.insert(result, byteStr) end print("-----------------------------------!!") s = serializeTable(result) print(s)
整理后的 windows-1251转换为UTF-8 的表
lcoal transTbl_1251toutf8 = { 1 = "\0", 2 = "\1", 3 = "\2", 4 = "\3", 5 = "\4", 6 = "\5", 7 = "\6", 8 = "\7", 9 = "\8", 10 = "\9", 11 = "\10", 12 = "\11", 13 = "\12", 14 = "\13", 15 = "\14", 16 = "\15", 17 = "\16", 18 = "\17", 19 = "\18", 20 = "\19", 21 = "\20", 22 = "\21", 23 = "\22", 24 = "\23", 25 = "\24", 26 = "\25", 27 = "\26", 28 = "\27", 29 = "\28", 30 = "\29", 31 = "\30", 32 = "\31", 33 = "\32", 34 = "\33", 35 = "\34", 36 = "\35", 37 = "\36", 38 = "\37", 39 = "\38", 40 = "\39", 41 = "\40", 42 = "\41", 43 = "\42", 44 = "\43", 45 = "\44", 46 = "\45", 47 = "\46", 48 = "\47", 49 = "\48", 50 = "\49", 51 = "\50", 52 = "\51", 53 = "\52", 54 = "\53", 55 = "\54", 56 = "\55", 57 = "\56", 58 = "\57", 59 = "\58", 60 = "\59", 61 = "\60", 62 = "\61", 63 = "\62", 64 = "\63", 65 = "\64", 66 = "\65", 67 = "\66", 68 = "\67", 69 = "\68", 70 = "\69", 71 = "\70", 72 = "\71", 73 = "\72", 74 = "\73", 75 = "\74", 76 = "\75", 77 = "\76", 78 = "\77", 79 = "\78", 80 = "\79", 81 = "\80", 82 = "\81", 83 = "\82", 84 = "\83", 85 = "\84", 86 = "\85", 87 = "\86", 88 = "\87", 89 = "\88", 90 = "\89", 91 = "\90", 92 = "\91", 93 = "\92", 94 = "\93", 95 = "\94", 96 = "\95", 97 = "\96", 98 = "\97", 99 = "\98", 100 = "\99", 101 = "\100", 102 = "\101", 103 = "\102", 104 = "\103", 105 = "\104", 106 = "\105", 107 = "\106", 108 = "\107", 109 = "\108", 110 = "\109", 111 = "\110", 112 = "\111", 113 = "\112", 114 = "\113", 115 = "\114", 116 = "\115", 117 = "\116", 118 = "\117", 119 = "\118", 120 = "\119", 121 = "\120", 122 = "\121", 123 = "\122", 124 = "\123", 125 = "\124", 126 = "\125", 127 = "\126", 128 = "\127", 129 = "\208\130", 130 = "\208\131", 131 = "\226\128\154", 132 = "\209\147", 133 = "\226\128\158", 134 = "\226\128\166", 135 = "\226\128\160", 136 = "\226\128\161", 137 = "\226\130\172", 138 = "\226\128\176", 139 = "\208\137", 140 = "\226\128\185", 141 = "\208\138", 142 = "\208\140", 143 = "\208\139", 144 = "\208\143", 145 = "\209\146", 146 = "\226\128\152", 147 = "\226\128\153", 148 = "\226\128\156", 149 = "\226\128\157", 150 = "\226\128\162", 151 = "\226\128\147", 152 = "\226\128\148", 153 = "", 154 = "\226\132\162", 155 = "\209\153", 156 = "\226\128\186", 157 = "\209\154", 158 = "\209\156", 159 = "\209\155", 160 = "\209\159", 161 = "\194\160", 162 = "\208\142", 163 = "\209\158", 164 = "\208\136", 165 = "\194\164", 166 = "\210\144", 167 = "\194\166", 168 = "\194\167", 169 = "\208\129", 170 = "\194\169", 171 = "\208\132", 172 = "\194\171", 173 = "\194\172", 174 = "\194\173", 175 = "\194\174", 176 = "\208\135", 177 = "\194\176", 178 = "\194\177", 179 = "\208\134", 180 = "\209\150", 181 = "\210\145", 182 = "\194\181", 183 = "\194\182", 184 = "\194\183", 185 = "\209\145", 186 = "\226\132\150", 187 = "\209\148", 188 = "\194\187", 189 = "\209\152", 190 = "\208\133", 191 = "\209\149", 192 = "\209\151", 193 = "\208\144", 194 = "\208\145", 195 = "\208\146", 196 = "\208\147", 197 = "\208\148", 198 = "\208\149", 199 = "\208\150", 200 = "\208\151", 201 = "\208\152", 202 = "\208\153", 203 = "\208\154", 204 = "\208\155", 205 = "\208\156", 206 = "\208\157", 207 = "\208\158", 208 = "\208\159", 209 = "\208\160", 210 = "\208\161", 211 = "\208\162", 212 = "\208\163", 213 = "\208\164", 214 = "\208\165", 215 = "\208\166", 216 = "\208\167", 217 = "\208\168", 218 = "\208\169", 219 = "\208\170", 220 = "\208\171", 221 = "\208\172", 222 = "\208\173", 223 = "\208\174", 224 = "\208\175", 225 = "\208\176", 226 = "\208\177", 227 = "\208\178", 228 = "\208\179", 229 = "\208\180", 230 = "\208\181", 231 = "\208\182", 232 = "\208\183", 233 = "\208\184", 234 = "\208\185", 235 = "\208\186", 236 = "\208\187", 237 = "\208\188", 238 = "\208\189", 239 = "\208\190", 240 = "\208\191", 241 = "\209\128", 242 = "\209\129", 243 = "\209\130", 244 = "\209\131", 245 = "\209\132", 246 = "\209\133", 247 = "\209\134", 248 = "\209\135", 249 = "\209\136", 250 = "\209\137", 251 = "\209\138", 252 = "\209\139", 253 = "\209\140", 254 = "\209\141", 255 = "\209\142", 256 = "\209\143", }
标签:
原文地址:http://www.cnblogs.com/lightsong/p/4634642.html