自然语言处理(3)之条件频率分布

时间：2014-09-02 22:47:45 阅读：323 评论：0 收藏：0 [点我收藏+]

标签：style blog color io for div sp log amp

自然语言处理(3)之条件频率分布

条件频率分布式频率分布的集合，每个频率分布有一个不同的条件。

从下面的例子就可以看出，cfd就是两个条件(news，romance)的频率分布集合

 1 >>> cfd=nltk.ConditionalFreqDist(
 2 ...                     (genre,word)
 3 ...                     for genre in [‘news‘,‘romance‘]
 4 ...                     for word in brown.words(categories=genre))
 5 >>> cfd
 6 <ConditionalFreqDist with 2 conditions>
 7 >>> list(cfd[‘news‘])[:4]
 8 [‘the‘, ‘,‘, ‘.‘, ‘of‘]
 9 >>> list(cfd[‘romance‘])[:4]
10 [‘,‘, ‘.‘, ‘the‘, ‘and‘]

可以用plot()和tabulte()来绘制分布图和分布表

示例：处理布朗此料库的新闻和言情文体，找出一周中最有新闻将至并且是最浪漫的日子.

 1 >>> from nltk.corpus import brown
 2 >>> days=[‘Monday‘,‘Tuesday‘,‘Wednesday‘,‘Thursday‘,‘Friday‘,‘Saturday‘,‘Saturday‘]
 3 >>> cfd=nltk.ConditionalFreqDist(
 4 ...                     (genre,word)
 5 ...                     for genre in [‘news‘,‘romance‘]
 6 ...                     for word in brown.words(categories=genre))
 7 >>> cfd.tabulate(samples=days,cumulative=True)
 8         Monday Tuesday Wednesday Thursday Friday Saturday Saturday
 9    news   54   97  119  139  180  213  246
10 romance    2    5    8    9   12   16   20
11 >>> cfd.tabulate(samples=days)
12         Monday Tuesday Wednesday Thursday Friday Saturday Saturday
13    news   54   43   22   20   41   33   33
14 romance    2    3    3    1    3    4    4

　　　　　　　　　　　　　　　　　　　　　　　　　　NLTK条件频率分布的常用方法

示例	描述
cfdist=ConditionalFreqDist()	从配对链表中创建条件频率分布
cfdist.conditions()	将条件按字母排序来分类
cfdist[condition]	此条件下的频率分布
cfdist[condition][sample]	此条件下给定样本的频率
cfdist.tabulate()	为条件频率分布制表
cfdist.tabulate(samples,conditions)	在指定样本和条件限制下制表
cfdist.plot()	为条件频率分布绘图
cfdist.plot(samples,conditions)	在指定样本和条件限制下绘图

自然语言处理(3)之条件频率分布

标签：style blog color io for div sp log amp

原文地址：http://www.cnblogs.com/rcfeng/p/3950589.html

踩

(0)

评论一句话评论（0）

分享档案

更多>

2021年07月29日 (22)
2021年07月28日 (40)
2021年07月27日 (32)
2021年07月26日 (79)
2021年07月23日 (29)
2021年07月22日 (30)
2021年07月21日 (42)
2021年07月20日 (16)
2021年07月19日 (90)
2021年07月16日 (35)

周排行