标签:ati been 64 bit reverse txt 中文分词 log start red
中文分词
import jieba book=open(‘D:\\xiaoshuo.txt‘,‘r‘,encoding=‘utf-8‘) #读入待分析的字符串 str=book.read() book.close() for i in ‘,。!、 \n “ ” ;‘: str=str.replace(i,‘‘) words=jieba.cut(str) word=set(words) #计数字典 dic={} for i in word: if len(i)>1: dic[i]=str.count(i) str=list(dic.items()) #排序 str.sort(key=lambda x:x[1],reverse=True) for i in range(20): print(str[i])
Python 3.6.2 (v3.6.2:5fd33b5, Jul 8 2017, 04:57:36) [MSC v.1900 64 bit (AMD64)] on win32
Type "copyright", "credits" or "license()" for more information.
>>>
============================= RESTART: D:/daa.py =============================
Building prefix dict from the default dictionary ...
Dumping model to file cache C:\Users\asus\AppData\Local\Temp\jieba.cache
Loading model cost 1.306 seconds.
Prefix dict has been built succesfully.
(‘父亲‘, 10)
(‘背影‘, 4)
(‘丧事‘, 3)
(‘北京‘, 3)
(‘散文‘, 3)
(‘茶房‘, 3)
(‘那年‘, 2)
(‘父母‘, 2)
(‘踌躇‘, 2)
(‘朱自清‘, 2)
(‘要紧‘, 2)
(‘终于‘, 2)
(‘日子‘, 2)
(‘一会‘, 2)
(‘一半‘, 2)
(‘子女‘, 2)
(‘描写‘, 2)
(‘回家‘, 2)
(‘不必‘, 2)
(‘为了‘, 2)
>>>
Python 3.6.2 (v3.6.2:5fd33b5, Jul 8 2017, 04:57:36) [MSC v.1900 64 bit (AMD64)] on win32
Type "copyright", "credits" or "license()" for more information.
>>>
============================= RESTART: D:/daa.py =============================
Building prefix dict from the default dictionary ...
Dumping model to file cache C:\Users\asus\AppData\Local\Temp\jieba.cache
Loading model cost 1.306 seconds.
Prefix dict has been built succesfully.
(‘父亲‘, 10)
(‘背影‘, 4)
(‘丧事‘, 3)
(‘北京‘, 3)
(‘散文‘, 3)
(‘茶房‘, 3)
(‘那年‘, 2)
(‘父母‘, 2)
(‘踌躇‘, 2)
(‘朱自清‘, 2)
(‘要紧‘, 2)
(‘终于‘, 2)
(‘日子‘, 2)
(‘一会‘, 2)
(‘一半‘, 2)
(‘子女‘, 2)
(‘描写‘, 2)
(‘回家‘, 2)
(‘不必‘, 2)
(‘为了‘, 2)
>>>
Python 3.6.2 (v3.6.2:5fd33b5, Jul 8 2017, 04:57:36) [MSC v.1900 64 bit (AMD64)] on win32
Type "copyright", "credits" or "license()" for more information.
>>>
============================= RESTART: D:/daa.py =============================
Building prefix dict from the default dictionary ...
Dumping model to file cache C:\Users\asus\AppData\Local\Temp\jieba.cache
Loading model cost 1.306 seconds.
Prefix dict has been built succesfully.
(‘父亲‘, 10)
(‘背影‘, 4)
(‘丧事‘, 3)
(‘北京‘, 3)
(‘散文‘, 3)
(‘茶房‘, 3)
(‘那年‘, 2)
(‘父母‘, 2)
(‘踌躇‘, 2)
(‘朱自清‘, 2)
(‘要紧‘, 2)
(‘终于‘, 2)
(‘日子‘, 2)
(‘一会‘, 2)
(‘一半‘, 2)
(‘子女‘, 2)
(‘描写‘, 2)
(‘回家‘, 2)
(‘不必‘, 2)
(‘为了‘, 2)
>>>
标签:ati been 64 bit reverse txt 中文分词 log start red
原文地址:http://www.cnblogs.com/xiepingjian/p/7612830.html