码迷,mamicode.com
首页 > 编程语言 > 详细

09.python汉字转拼音,五笔

时间:2015-12-06 11:27:37      阅读:351      评论:0      收藏:0      [点我收藏+]

标签:

python实现将汉字转换成汉语拼音的库_python_脚本之家 - http://www.jb51.net/article/65496.htm
python实现中文转拼音-keyxl-ChinaUnix博客 - http://blog.chinaunix.net/uid-26638338-id-3830276.html
中文拼音五笔转换带声调 - 在线工具 - http://tool.lu/py5bconvert/
pinyin4py 1.0.dev : Python Package Index - https://pypi.python.org/pypi/pinyin4py
汉字编码表 - 下载频道 - CSDN.NET - http://download.csdn.net/download/slowwind9999/291213

  1. #!/usr/bin/python
  2. #coding:utf-8
  3. #2015-11-04 21:23:17.230000
  4. """
  5. 改编自:python实现将汉字转换成汉语拼音的库_python_脚本之家 - http://www.jb51.net/article/65496.htm
  6. 从这里下的字典表(文件编码要转成utf8的): 汉字编码表 - 下载频道 - CSDN.NET - http://download.csdn.net/detail/slowwind9999/291213
  7. 可以用在线工具验证: 中文拼音五笔转换带声调 - 在线工具 - http://tool.lu/py5bconvert/
  8. 如果要新增函数把汉字转换为其他编码,仿照hanzi2pinyin或hanzi2wubi,再增加一个字典项并且在load_word里加载数据就行了;
  9. """
  10. import sys
  11. reload(sys)
  12. sys.setdefaultencoding(‘utf8‘)
  13. __version__ = ‘0.9‘
  14. __all__ = ["PinYin"]
  15. import os.path
  16. class Hanzi2code(object):
  17. def __init__(self, dict_file=‘code.txt‘): #code.txt的编码是utf8
  18. self.word_dict = {}
  19. self.wubi_dict = {}
  20. self.dict_file = dict_file
  21. self.load_word() #qxx 对象实例就加载word
  22. def load_word(self):
  23. if not os.path.exists(self.dict_file):
  24. raise IOError("NotFoundFile")
  25. with file(self.dict_file) as f_obj:
  26. codeList = f_obj.readlines()[6:]
  27. for f_line in codeList:
  28. try:
  29. line = f_line.strip().split()
  30. self.word_dict[line[0]] = line[1]
  31. self.wubi_dict[line[0]] = line[2]
  32. except:
  33. print ‘err....‘
  34. # line = f_line.split()
  35. # self.word_dict[line[0]] = line[1]
  36. # def hanzi2pinyin(self, string=""):
  37. # result = []
  38. # if not isinstance(string, unicode):
  39. # string = string.decode("utf-8")
  40. # for char in string:
  41. ## key = ‘%X‘ % ord(char)
  42. # result.append(self.word_dict.get(char.encode(‘utf8‘), char).split()[0].lower())
  43. # return result
  44. def hanzi2pinyin_split(self, string="", split=""):
  45. result = self.hanzi2pinyin(string=string)
  46. if split == "":
  47. return result
  48. else:
  49. return split.join(result)
  50. def hanzi2code(self,string=‘‘,dic={}):
  51. result = []
  52. if not isinstance(string, unicode):
  53. string = string.decode("utf-8")
  54. for char in string:
  55. # key = ‘%X‘ % ord(char)
  56. result.append(dic.get(char.encode(‘utf8‘), char).split()[0].lower())
  57. return result
  58. def hanzi2wubi(self,string=‘‘):
  59. return self.hanzi2code(string,self.wubi_dict)
  60. def hanzi2pinyin(self,string=‘‘):
  61. return self.hanzi2code(string,self.word_dict)
  62. if __name__ == "__main__":
  63. test = Hanzi2code()
  64. string = "钓鱼岛是中国的"
  65. print "in: %s" % string
  66. print "out: %s" % str(test.hanzi2pinyin(string=string))
  67. print "out: %s" % test.hanzi2pinyin_split(string=string, split="-")
  68. print "out: %s" % str(test.hanzi2wubi(string=string))





09.python汉字转拼音,五笔

标签:

原文地址:http://www.cnblogs.com/QIAOXINGXING001/p/5023117.html

(0)
(0)
   
举报
评论 一句话评论(0
登录后才能评论!
© 2014 mamicode.com 版权所有  联系我们:gaon5@hotmail.com
迷上了代码!