码迷,mamicode.com
首页 > 其他好文 > 详细

nltk(五)

时间:2020-03-16 09:22:31      阅读:72      评论:0      收藏:0      [点我收藏+]

标签:bst   proc   types   ``   app   inter   int   string   sign   

nltk.parse句法分析

1).上下文无关文法

2).递归下降解析器

3).图表分析,动态规划

技术图片
from nltk.parse import *
parser = CoreNLPParser(url=http://localhost:9966)

tokens = Rami Eid is studying at Stony Brook University in NY.split()
parser.tag(tokens)
View Code

nltk.tag词性标注

统一的接口

技术图片
class TaggerI(metaclass=ABCMeta):
    """
    A processing interface for assigning a tag to each token in a list.
    Tags are case sensitive strings that identify some property of each
    token, such as its part of speech or its sense.

    Some taggers require specific types for their tokens.  This is
    generally indicated by the use of a sub-interface to ``TaggerI``.
    For example, featureset taggers, which are subclassed from
    ``FeaturesetTagger``, require that each token be a ``featureset``.

    Subclasses must define:
      - either ``tag()`` or ``tag_sents()`` (or both)
    """

[docs]    @abstractmethod
    def tag(self, tokens):
        """
        Determine the most appropriate tag sequence for the given
        token sequence, and return a corresponding list of tagged
        tokens.  A tagged token is encoded as a tuple ``(token, tag)``.

        :rtype: list(tuple(str, str))
        """
        if overridden(self.tag_sents):
            return self.tag_sents([tokens])[0]


[docs]    def tag_sents(self, sentences):
        """
        Apply ``self.tag()`` to each element of *sentences*.  I.e.:

            return [self.tag(sent) for sent in sentences]
        """
        return [self.tag(sent) for sent in sentences]


[docs]    def evaluate(self, gold):
        """
        Score the accuracy of the tagger against the gold standard.
        Strip the tags from the gold standard text, retag it using
        the tagger, then compute the accuracy score.

        :type gold: list(list(tuple(str, str)))
        :param gold: The list of tagged sentences to score the tagger on.
        :rtype: float
        """

        tagged_sents = self.tag_sents(untag(sent) for sent in gold)
        gold_tokens = list(chain(*gold))
        test_tokens = list(chain(*tagged_sents))
        return accuracy(gold_tokens, test_tokens)


    def _check_params(self, train, model):
        if (train and model) or (not train and not model):
            raise ValueError("Must specify either training data or trained model.")
View Code
技术图片
from nltk.tag import CRFTagger
ct = CRFTagger()
train_data = [[(University,Noun), (is,Verb), (a,Det), (good,Adj), (place,Noun)],
ct.train(train_data,model.crf.tagger)
ct.tag_sents([[dog,is,good], [Cat,eat,meat]])
View Code

 

nltk(五)

标签:bst   proc   types   ``   app   inter   int   string   sign   

原文地址:https://www.cnblogs.com/yangyang12138/p/12501578.html

(0)
(0)
   
举报
评论 一句话评论(0
登录后才能评论!
© 2014 mamicode.com 版权所有  联系我们:gaon5@hotmail.com
迷上了代码!