nltk(二)

时间：2020-03-12 09:28:25 阅读：55 评论：0 收藏：0 [点我收藏+]

1.collocations模块

用于计算一组单词中，没window_size个单词中n个词同时出现的次数

from nltk.collocations import *

sent = ‘this this is is a a test test‘.split()

b = BigramCollocationFinder.from_words(sent, window_size=2)

b.ngram_fd.items()

View Code

BigramCollocationFinder 用于计算两个单词出现的次数

TrigramCollocationFinder 用于计算三个单词出现的次数

QuadgramCollocationFinder 用于计算n个单词出现的次数

2.data模块

用于管理语言包的路径信息

nltk.data.path 返回语言包路径list

nltk.data.PathPointer路径指针基类

有FileSystemPathPointer和BufferedGzipFile两个子类分别用于处理普通文件和压缩文件

`3.featstruct` 模块

用于表示特征，功能类似与dict和list

Feature 用于存放一个特征，有个name属性和value

有SlashFeature和RangeFeature两个子类

FeatStruct有若干个特征

有FeatDict和FeatList两个子类

from nltk.featstruct import FeatStruct
FeatStruct(‘[a=?x]‘).unify(FeatStruct(‘[b=?x]‘))

`4.grammar` 模块

用于处理自定义文法

import nltk
from nltk import CFG

grammar = nltk.CFG.fromstring("""
  S -> NP VP
  VP -> V NP | V NP PP
  PP -> P NP
  V -> "saw" | "ate" | "walked"
  NP -> "John" | "Mary" | "Bob" | Det N | Det N PP
  Det -> "a" | "an" | "the" | "my"
  N -> "man" | "dog" | "cat" | "telescope" | "park"
  P -> "in" | "on" | "by" | "with"
  """)
sent = ‘Mary saw Bob‘.split()
rd_parser = nltk.RecursiveDescentParser(grammar)
for i in rd_parser.parse(sent):
    print(i)

View Code

`5.probability` 模块

主要包括词频列表，词频字典，概率分布（ELEProbDist）

from nltk.probability import ConditionalFreqDist
from nltk.tokenize import word_tokenize
sent = "the the the dog dog some other words that we do not care about"
cfdist = ConditionalFreqDist()
for word in word_tokenize(sent):
    print(word)
    condition = len(word)
    cfdist[condition][word] += 1
cfdist2 = ConditionalFreqDist((len(word), word) for word in word_tokenize(sent))

View Code

`6.text` 模块

用于处理文本信息，主要包括单词查找，单词拆分，文本包装器

import nltk.corpus
from nltk.text import TextCollection
from nltk.book import text1, text2, text3

gutenberg = TextCollection(nltk.corpus.gutenberg)
mytexts = TextCollection([text1, text2, text3])

View Code

`7.tree` 模块

用于生成和打印语法树

nltk(二)

标签：file 技术分布 ber 单词查找 ash code spl out

原文地址：https://www.cnblogs.com/yangyang12138/p/12466808.html

踩

(0)

评论一句话评论（0）

分享档案

更多>

2021年07月29日 (22)
2021年07月28日 (40)
2021年07月27日 (32)
2021年07月26日 (79)
2021年07月23日 (29)
2021年07月22日 (30)
2021年07月21日 (42)
2021年07月20日 (16)
2021年07月19日 (90)
2021年07月16日 (35)

周排行

nltk(二)

1.collocations模块

2.data模块

3.featstruct 模块

4.grammar 模块

5.probability 模块

6.text 模块

7.tree 模块

`3.featstruct` 模块

`4.grammar` 模块

`5.probability` 模块

`6.text` 模块

`7.tree` 模块