码迷,mamicode.com
首页 > 编程语言 > 详细

以spacy中函数调用为例记录对自然语言基本处理任务

时间:2017-05-22 22:12:14      阅读:186      评论:0      收藏:0      [点我收藏+]

标签:pen   rom   proc   mat   处理   任务   named   web   ext   

# coding=utf-8

import spacy
nlp=spacy.load(en_core_web_md-1.2.1)
docx=nlp(uThe ways to process documents are so varied and application- and language-dependent that I decided to not constrain them by any interface. Instead, a document is represented by the features extracted from it, not by its "surface" string form: how you get to the features is up to you. Below I describe one common, general-purpose approach (called bag-of-words), but keep in mind that different application domains call for different features, and, as always, it’s garbage in, garbage out...)

‘‘‘
功能测试
‘‘‘
#1.分词 tokenize
print #################tokenization
for token in docx:
        print token
#2.词性标注 pos tagging
print #################part of speech tagging
for token in docx:
        print(token, token.pos_, token.pos)
#3.命名实体识别 Named Entity Recognition
print ################# Named Entity Recognition
for ent in docx.ents:
        print(ent,ent.label_,ent.label)
#4.词干化 Lemmatize
print #################Lemmatize
for token in docx:
        print(token,token.lemma_,token.lemma)        
#5.名词短语提取 Noun Phrase Extraction
print #################Noun Phrase Extraction
for np in docx.noun_chunks:
        print np
#6.断句 Sentence segmentation
print #################Sentence segmentation
for sent in docx.sents:
        print sent

 

以spacy中函数调用为例记录对自然语言基本处理任务

标签:pen   rom   proc   mat   处理   任务   named   web   ext   

原文地址:http://www.cnblogs.com/wxiaoli/p/6891493.html

(0)
(0)
   
举报
评论 一句话评论(0
登录后才能评论!
© 2014 mamicode.com 版权所有  联系我们:gaon5@hotmail.com
迷上了代码!