以spacy中函数调用为例记录对自然语言基本处理任务

时间：2017-05-22 22:12:14 阅读：186 评论：0 收藏：0 [点我收藏+]

# coding=utf-8

import spacy
nlp=spacy.load(‘en_core_web_md-1.2.1‘)
docx=nlp(u‘The ways to process documents are so varied and application- and language-dependent that I decided to not constrain them by any interface. Instead, a document is represented by the features extracted from it, not by its "surface" string form: how you get to the features is up to you. Below I describe one common, general-purpose approach (called bag-of-words), but keep in mind that different application domains call for different features, and, as always, it’s garbage in, garbage out...‘)

‘‘‘
功能测试
‘‘‘
#1.分词 tokenize
print ‘#################tokenization‘
for token in docx:
        print token
#2.词性标注 pos tagging
print ‘#################part of speech tagging‘
for token in docx:
        print(token, token.pos_, token.pos)
#3.命名实体识别 Named Entity Recognition
print ‘################# Named Entity Recognition‘
for ent in docx.ents:
        print(ent,ent.label_,ent.label)
#4.词干化 Lemmatize
print ‘#################Lemmatize‘
for token in docx:
        print(token,token.lemma_,token.lemma)        
#5.名词短语提取 Noun Phrase Extraction
print ‘#################Noun Phrase Extraction‘
for np in docx.noun_chunks:
        print np
#6.断句 Sentence segmentation
print ‘#################Sentence segmentation‘
for sent in docx.sents:
        print sent

标签：pen rom proc mat 处理任务 named web ext

原文地址：http://www.cnblogs.com/wxiaoli/p/6891493.html

踩

(0)

评论一句话评论（0）

分享档案

更多>

2021年07月29日 (22)
2021年07月28日 (40)
2021年07月27日 (32)
2021年07月26日 (79)
2021年07月23日 (29)
2021年07月22日 (30)
2021年07月21日 (42)
2021年07月20日 (16)
2021年07月19日 (90)
2021年07月16日 (35)

周排行