码迷,mamicode.com
首页 > 编程语言 > 详细

Python3NLTK-自然语言处理

时间:2018-04-30 18:07:01      阅读:663      评论:0      收藏:0      [点我收藏+]

标签:pru   bre   odm   mib   grub   amr   UI   lib   lame   

NLTK

从NLTK中的book模块中,载入所有条目

  • book 模块包含所有数据
from nltk.book import *
*** Introductory Examples for the NLTK Book ***
Loading text1, ..., text9 and sent1, ..., sent9
Type the name of the text or sentence to view it.
Type: ‘texts()‘ or ‘sents()‘ to list the materials.
text1: Moby Dick by Herman Melville 1851
text2: Sense and Sensibility by Jane Austen 1811
text3: The Book of Genesis
text4: Inaugural Address Corpus
text5: Chat Corpus
text6: Monty Python and the Holy Grail
text7: Wall Street Journal
text8: Personals Corpus
text9: The Man Who Was Thursday by G . K . Chesterton 1908
text1
<Text: Moby Dick by Herman Melville 1851>
text2
<Text: Sense and Sensibility by Jane Austen 1811>

搜索文本或主题

  1. concordance允许在课文中查找单词,并打印出来
  2. similar 用来识别文章中和搜索词相似的词语,可以用在搜索引擎中的相关度识别功能中。
  3. common_contexts 用来识别2个关键词相似的词语。
  4. dispersion_plot 绘制单词的离散图
text1.concordance(‘monstrous‘) # 在text1中查阅词汇‘monstrous‘
# concordance 
# 英 [k?n‘k??d(?)ns]  美 [k?n‘k?rdns]
# n. 调和,一致;用语索引;著作或作家全集的重要用字索引
Displaying 11 of 11 matches:
ong the former , one was of a most monstrous size . ... This came towards us , 
ON OF THE PSALMS . " Touching that monstrous bulk of the whale or ork we have r
ll over with a heathenish array of monstrous clubs and spears . Some were thick
d as you gazed , and wondered what monstrous cannibal and savage could ever hav
that has survived the flood ; most monstrous and most mountainous ! That Himmal
they might scout at Moby Dick as a monstrous fable , or still worse and more de
th of Radney .‘" CHAPTER 55 Of the Monstrous Pictures of Whales . I shall ere l
ing Scenes . In connexion with the monstrous pictures of whales , I am strongly
ere to enter upon those still more monstrous stories of them which are to be fo
ght have been rummaged out of this monstrous cabinet there is no telling . But 
of Whale - Bones ; for Whales of a monstrous size are oftentimes cast up dead u
text2.concordance(‘affection‘)
Displaying 25 of 79 matches:
, however , and , as a mark of his affection for the three girls , he left them
t . It was very well known that no affection was ever supposed to exist between
deration of politeness or maternal affection on the side of the former , the tw
d the suspicion -- the hope of his affection for me may warrant , without impru
hich forbade the indulgence of his affection . She knew that his mother neither
rd she gave one with still greater affection . Though her late conversation wit
 can never hope to feel or inspire affection again , and if her home be uncomfo
m of the sense , elegance , mutual affection , and domestic comfort of the fami
, and which recommended him to her affection beyond every thing else . His soci
ween the parties might forward the affection of Mr . Willoughby , an equally st
 the most pointed assurance of her affection . Elinor could not be surprised at
he natural consequence of a strong affection in a young and ardent mind . This 
 opinion . But by an appeal to her affection for her mother , by representing t
 every alteration of a place which affection had established as perfect with hi
e will always have one claim of my affection , which no other can possibly shar
f the evening declared at once his affection and happiness . " Shall we see you
ause he took leave of us with less affection than his usual behaviour has shewn
ness ." " I want no proof of their affection ," said Elinor ; " but of their en
onths , without telling her of his affection ;-- that they should part without 
ould be the natural result of your affection for her . She used to be all unres
distinguished Elinor by no mark of affection . Marianne saw and listened with i
th no inclination for expense , no affection for strangers , no profession , an
till distinguished her by the same affection which once she had felt no doubt o
al of her confidence in Edward ‘ s affection , to the remembrance of every mark
 was made ? Had he never owned his affection to yourself ?" " Oh , no ; but if 
text1.similar(‘monstrous‘)
true contemptible christian abundant few part mean careful puzzled
mystifying passing curious loving wise doleful gamesome singular
delightfully perilous fearless
text2.similar(‘monstrous‘)
very so exceedingly heartily a as good great extremely remarkably
sweet vast amazingly
text2.common_contexts([‘monstrous‘,‘very‘])
a_pretty am_glad a_lucky is_pretty be_glad
# 从文本中检查一个单词的位置,从该单词出现开始出现了多少次。
# Each stripe represents an instance of a word, 
# and each row represents the entire text.
text4.dispersion_plot([‘citizens‘,‘democracy‘,‘freedon‘,‘duties‘,‘America‘,‘liberty‘])
# dispersion 
# 英 [d?‘sp???(?)n]  美 [d?‘sp??n]
# n. 散布;[统计][数] 离差;驱散

技术分享图片

print(text3.generate(‘monstrous‘))
None

统计词汇

len(text3)
44764
sorted(set(text3))
[‘!‘,
 "‘",
 ‘(‘,
 ‘)‘,
 ‘,‘,
 ‘,)‘,
 ‘.‘,
 ‘.)‘,
 ‘:‘,
 ‘;‘,
 ‘;)‘,
 ‘?‘,
 ‘?)‘,
 ‘A‘,
 ‘Abel‘,
 ‘Abelmizraim‘,
 ‘Abidah‘,
 ‘Abide‘,
 ‘Abimael‘,
 ‘Abimelech‘,
 ‘Abr‘,
 ‘Abrah‘,
 ‘Abraham‘,
 ‘Abram‘,
 ‘Accad‘,
 ‘Achbor‘,
 ‘Adah‘,
 ‘Adam‘,
 ‘Adbeel‘,
 ‘Admah‘,
 ‘Adullamite‘,
 ‘After‘,
 ‘Aholibamah‘,
 ‘Ahuzzath‘,
 ‘Ajah‘,
 ‘Akan‘,
 ‘All‘,
 ‘Allonbachuth‘,
 ‘Almighty‘,
 ‘Almodad‘,
 ‘Also‘,
 ‘Alvah‘,
 ‘Alvan‘,
 ‘Am‘,
 ‘Amal‘,
 ‘Amalek‘,
 ‘Amalekites‘,
 ‘Ammon‘,
 ‘Amorite‘,
 ‘Amorites‘,
 ‘Amraphel‘,
 ‘An‘,
 ‘Anah‘,
 ‘Anamim‘,
 ‘And‘,
 ‘Aner‘,
 ‘Angel‘,
 ‘Appoint‘,
 ‘Aram‘,
 ‘Aran‘,
 ‘Ararat‘,
 ‘Arbah‘,
 ‘Ard‘,
 ‘Are‘,
 ‘Areli‘,
 ‘Arioch‘,
 ‘Arise‘,
 ‘Arkite‘,
 ‘Arodi‘,
 ‘Arphaxad‘,
 ‘Art‘,
 ‘Arvadite‘,
 ‘As‘,
 ‘Asenath‘,
 ‘Ashbel‘,
 ‘Asher‘,
 ‘Ashkenaz‘,
 ‘Ashteroth‘,
 ‘Ask‘,
 ‘Asshur‘,
 ‘Asshurim‘,
 ‘Assyr‘,
 ‘Assyria‘,
 ‘At‘,
 ‘Atad‘,
 ‘Avith‘,
 ‘Baalhanan‘,
 ‘Babel‘,
 ‘Bashemath‘,
 ‘Be‘,
 ‘Because‘,
 ‘Becher‘,
 ‘Bedad‘,
 ‘Beeri‘,
 ‘Beerlahairoi‘,
 ‘Beersheba‘,
 ‘Behold‘,
 ‘Bela‘,
 ‘Belah‘,
 ‘Benam‘,
 ‘Benjamin‘,
 ‘Beno‘,
 ‘Beor‘,
 ‘Bera‘,
 ‘Bered‘,
 ‘Beriah‘,
 ‘Bethel‘,
 ‘Bethlehem‘,
 ‘Bethuel‘,
 ‘Beware‘,
 ‘Bilhah‘,
 ‘Bilhan‘,
 ‘Binding‘,
 ‘Birsha‘,
 ‘Bless‘,
 ‘Blessed‘,
 ‘Both‘,
 ‘Bow‘,
 ‘Bozrah‘,
 ‘Bring‘,
 ‘But‘,
 ‘Buz‘,
 ‘By‘,
 ‘Cain‘,
 ‘Cainan‘,
 ‘Calah‘,
 ‘Calneh‘,
 ‘Can‘,
 ‘Cana‘,
 ‘Canaan‘,
 ‘Canaanite‘,
 ‘Canaanites‘,
 ‘Canaanitish‘,
 ‘Caphtorim‘,
 ‘Carmi‘,
 ‘Casluhim‘,
 ‘Cast‘,
 ‘Cause‘,
 ‘Chaldees‘,
 ‘Chedorlaomer‘,
 ‘Cheran‘,
 ‘Cherubims‘,
 ‘Chesed‘,
 ‘Chezib‘,
 ‘Come‘,
 ‘Cursed‘,
 ‘Cush‘,
 ‘Damascus‘,
 ‘Dan‘,
 ‘Day‘,
 ‘Deborah‘,
 ‘Dedan‘,
 ‘Deliver‘,
 ‘Diklah‘,
 ‘Din‘,
 ‘Dinah‘,
 ‘Dinhabah‘,
 ‘Discern‘,
 ‘Dishan‘,
 ‘Dishon‘,
 ‘Do‘,
 ‘Dodanim‘,
 ‘Dothan‘,
 ‘Drink‘,
 ‘Duke‘,
 ‘Dumah‘,
 ‘Earth‘,
 ‘Ebal‘,
 ‘Eber‘,
 ‘Edar‘,
 ‘Eden‘,
 ‘Edom‘,
 ‘Edomites‘,
 ‘Egy‘,
 ‘Egypt‘,
 ‘Egyptia‘,
 ‘Egyptian‘,
 ‘Egyptians‘,
 ‘Ehi‘,
 ‘Elah‘,
 ‘Elam‘,
 ‘Elbethel‘,
 ‘Eldaah‘,
 ‘EleloheIsrael‘,
 ‘Eliezer‘,
 ‘Eliphaz‘,
 ‘Elishah‘,
 ‘Ellasar‘,
 ‘Elon‘,
 ‘Elparan‘,
 ‘Emins‘,
 ‘En‘,
 ‘Enmishpat‘,
 ‘Eno‘,
 ‘Enoch‘,
 ‘Enos‘,
 ‘Ephah‘,
 ‘Epher‘,
 ‘Ephra‘,
 ‘Ephraim‘,
 ‘Ephrath‘,
 ‘Ephron‘,
 ‘Er‘,
 ‘Erech‘,
 ‘Eri‘,
 ‘Es‘,
 ‘Esau‘,
 ‘Escape‘,
 ‘Esek‘,
 ‘Eshban‘,
 ‘Eshcol‘,
 ‘Ethiopia‘,
 ‘Euphrat‘,
 ‘Euphrates‘,
 ‘Eve‘,
 ‘Even‘,
 ‘Every‘,
 ‘Except‘,
 ‘Ezbon‘,
 ‘Ezer‘,
 ‘Fear‘,
 ‘Feed‘,
 ‘Fifteen‘,
 ‘Fill‘,
 ‘For‘,
 ‘Forasmuch‘,
 ‘Forgive‘,
 ‘From‘,
 ‘Fulfil‘,
 ‘G‘,
 ‘Gad‘,
 ‘Gaham‘,
 ‘Galeed‘,
 ‘Gatam‘,
 ‘Gather‘,
 ‘Gaza‘,
 ‘Gentiles‘,
 ‘Gera‘,
 ‘Gerar‘,
 ‘Gershon‘,
 ‘Get‘,
 ‘Gether‘,
 ‘Gihon‘,
 ‘Gilead‘,
 ‘Girgashites‘,
 ‘Girgasite‘,
 ‘Give‘,
 ‘Go‘,
 ‘God‘,
 ‘Gomer‘,
 ‘Gomorrah‘,
 ‘Goshen‘,
 ‘Guni‘,
 ‘Hadad‘,
 ‘Hadar‘,
 ‘Hadoram‘,
 ‘Hagar‘,
 ‘Haggi‘,
 ‘Hai‘,
 ‘Ham‘,
 ‘Hamathite‘,
 ‘Hamor‘,
 ‘Hamul‘,
 ‘Hanoch‘,
 ‘Happy‘,
 ‘Haran‘,
 ‘Hast‘,
 ‘Haste‘,
 ‘Have‘,
 ‘Havilah‘,
 ‘Hazarmaveth‘,
 ‘Hazezontamar‘,
 ‘Hazo‘,
 ‘He‘,
 ‘Hear‘,
 ‘Heaven‘,
 ‘Heber‘,
 ‘Hebrew‘,
 ‘Hebrews‘,
 ‘Hebron‘,
 ‘Hemam‘,
 ‘Hemdan‘,
 ‘Here‘,
 ‘Hereby‘,
 ‘Heth‘,
 ‘Hezron‘,
 ‘Hiddekel‘,
 ‘Hinder‘,
 ‘Hirah‘,
 ‘His‘,
 ‘Hitti‘,
 ‘Hittite‘,
 ‘Hittites‘,
 ‘Hivite‘,
 ‘Hobah‘,
 ‘Hori‘,
 ‘Horite‘,
 ‘Horites‘,
 ‘How‘,
 ‘Hul‘,
 ‘Huppim‘,
 ‘Husham‘,
 ‘Hushim‘,
 ‘Huz‘,
 ‘I‘,
 ‘If‘,
 ‘In‘,
 ‘Irad‘,
 ‘Iram‘,
 ‘Is‘,
 ‘Isa‘,
 ‘Isaac‘,
 ‘Iscah‘,
 ‘Ishbak‘,
 ‘Ishmael‘,
 ‘Ishmeelites‘,
 ‘Ishuah‘,
 ‘Isra‘,
 ‘Israel‘,
 ‘Issachar‘,
 ‘Isui‘,
 ‘It‘,
 ‘Ithran‘,
 ‘Jaalam‘,
 ‘Jabal‘,
 ‘Jabbok‘,
 ‘Jac‘,
 ‘Jachin‘,
 ‘Jacob‘,
 ‘Jahleel‘,
 ‘Jahzeel‘,
 ‘Jamin‘,
 ‘Japhe‘,
 ‘Japheth‘,
 ‘Jared‘,
 ‘Javan‘,
 ‘Jebusite‘,
 ‘Jebusites‘,
 ‘Jegarsahadutha‘,
 ‘Jehovahjireh‘,
 ‘Jemuel‘,
 ‘Jerah‘,
 ‘Jetheth‘,
 ‘Jetur‘,
 ‘Jeush‘,
 ‘Jezer‘,
 ‘Jidlaph‘,
 ‘Jimnah‘,
 ‘Job‘,
 ‘Jobab‘,
 ‘Jokshan‘,
 ‘Joktan‘,
 ‘Jordan‘,
 ‘Joseph‘,
 ‘Jubal‘,
 ‘Judah‘,
 ‘Judge‘,
 ‘Judith‘,
 ‘Kadesh‘,
 ‘Kadmonites‘,
 ‘Karnaim‘,
 ‘Kedar‘,
 ‘Kedemah‘,
 ‘Kemuel‘,
 ‘Kenaz‘,
 ‘Kenites‘,
 ‘Kenizzites‘,
 ‘Keturah‘,
 ‘Kiriathaim‘,
 ‘Kirjatharba‘,
 ‘Kittim‘,
 ‘Know‘,
 ‘Kohath‘,
 ‘Kor‘,
 ‘Korah‘,
 ‘LO‘,
 ‘LORD‘,
 ‘Laban‘,
 ‘Lahairoi‘,
 ‘Lamech‘,
 ‘Lasha‘,
 ‘Lay‘,
 ‘Leah‘,
 ‘Lehabim‘,
 ‘Lest‘,
 ‘Let‘,
 ‘Letushim‘,
 ‘Leummim‘,
 ‘Levi‘,
 ‘Lie‘,
 ‘Lift‘,
 ‘Lo‘,
 ‘Look‘,
 ‘Lot‘,
 ‘Lotan‘,
 ‘Lud‘,
 ‘Ludim‘,
 ‘Luz‘,
 ‘Maachah‘,
 ‘Machir‘,
 ‘Machpelah‘,
 ‘Madai‘,
 ‘Magdiel‘,
 ‘Magog‘,
 ‘Mahalaleel‘,
 ‘Mahalath‘,
 ‘Mahanaim‘,
 ‘Make‘,
 ‘Malchiel‘,
 ‘Male‘,
 ‘Mam‘,
 ‘Mamre‘,
 ‘Man‘,
 ‘Manahath‘,
 ‘Manass‘,
 ‘Manasseh‘,
 ‘Mash‘,
 ‘Masrekah‘,
 ‘Massa‘,
 ‘Matred‘,
 ‘Me‘,
 ‘Medan‘,
 ‘Mehetabel‘,
 ‘Mehujael‘,
 ‘Melchizedek‘,
 ‘Merari‘,
 ‘Mesha‘,
 ‘Meshech‘,
 ‘Mesopotamia‘,
 ‘Methusa‘,
 ‘Methusael‘,
 ‘Methuselah‘,
 ‘Mezahab‘,
 ‘Mibsam‘,
 ‘Mibzar‘,
 ‘Midian‘,
 ‘Midianites‘,
 ‘Milcah‘,
 ‘Mishma‘,
 ‘Mizpah‘,
 ‘Mizraim‘,
 ‘Mizz‘,
 ‘Moab‘,
 ‘Moabites‘,
 ‘Moreh‘,
 ‘Moreover‘,
 ‘Moriah‘,
 ‘Muppim‘,
 ‘My‘,
 ‘Naamah‘,
 ‘Naaman‘,
 ‘Nahath‘,
 ‘Nahor‘,
 ‘Naphish‘,
 ‘Naphtali‘,
 ‘Naphtuhim‘,
 ‘Nay‘,
 ‘Nebajoth‘,
 ‘Neither‘,
 ‘Night‘,
 ‘Nimrod‘,
 ‘Nineveh‘,
 ‘Noah‘,
 ‘Nod‘,
 ‘Not‘,
 ‘Now‘,
 ‘O‘,
 ‘Obal‘,
 ‘Of‘,
 ‘Oh‘,
 ‘Ohad‘,
 ‘Omar‘,
 ‘On‘,
 ‘Onam‘,
 ‘Onan‘,
 ‘Only‘,
 ‘Ophir‘,
 ‘Our‘,
 ‘Out‘,
 ‘Padan‘,
 ‘Padanaram‘,
 ‘Paran‘,
 ‘Pass‘,
 ‘Pathrusim‘,
 ‘Pau‘,
 ‘Peace‘,
 ‘Peleg‘,
 ‘Peniel‘,
 ‘Penuel‘,
 ‘Peradventure‘,
 ‘Perizzit‘,
 ‘Perizzite‘,
 ‘Perizzites‘,
 ‘Phallu‘,
 ‘Phara‘,
 ‘Pharaoh‘,
 ‘Pharez‘,
 ‘Phichol‘,
 ‘Philistim‘,
 ‘Philistines‘,
 ‘Phut‘,
 ‘Phuvah‘,
 ‘Pildash‘,
 ‘Pinon‘,
 ‘Pison‘,
 ‘Potiphar‘,
 ‘Potipherah‘,
 ‘Put‘,
 ‘Raamah‘,
 ‘Rachel‘,
 ‘Rameses‘,
 ‘Rebek‘,
 ‘Rebekah‘,
 ‘Rehoboth‘,
 ‘Remain‘,
 ‘Rephaims‘,
 ‘Resen‘,
 ‘Return‘,
 ‘Reu‘,
 ‘Reub‘,
 ‘Reuben‘,
 ‘Reuel‘,
 ‘Reumah‘,
 ‘Riphath‘,
 ‘Rosh‘,
 ‘Sabtah‘,
 ‘Sabtech‘,
 ‘Said‘,
 ‘Salah‘,
 ‘Salem‘,
 ‘Samlah‘,
 ‘Sarah‘,
 ‘Sarai‘,
 ‘Saul‘,
 ‘Save‘,
 ‘Say‘,
 ‘Se‘,
 ‘Seba‘,
 ‘See‘,
 ‘Seeing‘,
 ‘Seir‘,
 ‘Sell‘,
 ‘Send‘,
 ‘Sephar‘,
 ‘Serah‘,
 ‘Sered‘,
 ‘Serug‘,
 ‘Set‘,
 ‘Seth‘,
 ‘Shalem‘,
 ‘Shall‘,
 ‘Shalt‘,
 ‘Shammah‘,
 ‘Shaul‘,
 ‘Shaveh‘,
 ‘She‘,
 ‘Sheba‘,
 ‘Shebah‘,
 ‘Shechem‘,
 ‘Shed‘,
 ‘Shel‘,
 ‘Shelah‘,
 ‘Sheleph‘,
 ‘Shem‘,
 ‘Shemeber‘,
 ‘Shepho‘,
 ‘Shillem‘,
 ‘Shiloh‘,
 ‘Shimron‘,
 ‘Shinab‘,
 ‘Shinar‘,
 ‘Shobal‘,
 ‘Should‘,
 ‘Shuah‘,
 ‘Shuni‘,
 ‘Shur‘,
 ‘Sichem‘,
 ‘Siddim‘,
 ‘Sidon‘,
 ‘Simeon‘,
 ‘Sinite‘,
 ‘Sitnah‘,
 ‘Slay‘,
 ‘So‘,
 ‘Sod‘,
 ‘Sodom‘,
 ‘Sojourn‘,
 ‘Some‘,
 ‘Spake‘,
 ‘Speak‘,
 ‘Spirit‘,
 ‘Stand‘,
 ‘Succoth‘,
 ‘Surely‘,
 ‘Swear‘,
 ‘Syrian‘,
 ‘Take‘,
 ‘Tamar‘,
 ‘Tarshish‘,
 ‘Tebah‘,
 ‘Tell‘,
 ‘Tema‘,
 ‘Teman‘,
 ‘Temani‘,
 ‘Terah‘,
 ‘Thahash‘,
 ‘That‘,
 ‘The‘,
 ‘Then‘,
 ‘There‘,
 ‘Therefore‘,
 ‘These‘,
 ‘They‘,
 ‘Thirty‘,
 ‘This‘,
 ‘Thorns‘,
 ‘Thou‘,
 ‘Thus‘,
 ‘Thy‘,
 ‘Tidal‘,
 ‘Timna‘,
 ‘Timnah‘,
 ‘Timnath‘,
 ‘Tiras‘,
 ‘To‘,
 ‘Togarmah‘,
 ‘Tola‘,
 ‘Tubal‘,
 ‘Tubalcain‘,
 ‘Twelve‘,
 ‘Two‘,
 ‘Unstable‘,
 ‘Until‘,
 ‘Unto‘,
 ‘Up‘,
 ‘Upon‘,
 ‘Ur‘,
 ‘Uz‘,
 ‘Uzal‘,
 ‘We‘,
 ‘What‘,
 ‘When‘,
 ‘Whence‘,
 ‘Where‘,
 ‘Whereas‘,
 ‘Wherefore‘,
 ‘Which‘,
 ‘While‘,
 ‘Who‘,
 ‘Whose‘,
 ‘Whoso‘,
 ‘Why‘,
 ‘Wilt‘,
 ‘With‘,
 ‘Woman‘,
 ‘Ye‘,
 ‘Yea‘,
 ‘Yet‘,
 ‘Zaavan‘,
 ‘Zaphnathpaaneah‘,
 ‘Zar‘,
 ‘Zarah‘,
 ‘Zeboiim‘,
 ‘Zeboim‘,
 ‘Zebul‘,
 ‘Zebulun‘,
 ‘Zemarite‘,
 ‘Zepho‘,
 ‘Zerah‘,
 ‘Zibeon‘,
 ‘Zidon‘,
 ‘Zillah‘,
 ‘Zilpah‘,
 ‘Zimran‘,
 ‘Ziphion‘,
 ‘Zo‘,
 ‘Zoar‘,
 ‘Zohar‘,
 ‘Zuzims‘,
 ‘a‘,
 ‘abated‘,
 ‘abide‘,
 ‘able‘,
 ‘abode‘,
 ‘abomination‘,
 ‘about‘,
 ‘above‘,
 ‘abroad‘,
 ‘absent‘,
 ‘abundantly‘,
 ‘accept‘,
 ‘accepted‘,
 ‘according‘,
 ‘acknowledged‘,
 ‘activity‘,
 ‘add‘,
 ‘adder‘,
 ‘afar‘,
 ‘afflict‘,
 ‘affliction‘,
 ‘afraid‘,
 ‘after‘,
 ‘afterward‘,
 ‘afterwards‘,
 ‘aga‘,
 ‘again‘,
 ‘against‘,
 ‘age‘,
 ‘aileth‘,
 ‘air‘,
 ‘al‘,
 ‘alive‘,
 ‘all‘,
 ‘almon‘,
 ‘alo‘,
 ‘alone‘,
 ‘aloud‘,
 ‘also‘,
 ‘altar‘,
 ‘altogether‘,
 ‘always‘,
 ‘am‘,
 ‘among‘,
 ‘amongst‘,
 ‘an‘,
 ‘and‘,
 ‘angel‘,
 ‘angels‘,
 ‘anger‘,
 ‘angry‘,
 ‘anguish‘,
 ‘anointedst‘,
 ‘anoth‘,
 ‘another‘,
 ‘answer‘,
 ‘answered‘,
 ‘any‘,
 ‘anything‘,
 ‘appe‘,
 ‘appear‘,
 ‘appeared‘,
 ‘appease‘,
 ‘appoint‘,
 ‘appointed‘,
 ‘aprons‘,
 ‘archer‘,
 ‘archers‘,
 ‘are‘,
 ‘arise‘,
 ‘ark‘,
 ‘armed‘,
 ‘arms‘,
 ‘army‘,
 ‘arose‘,
 ‘arrayed‘,
 ‘art‘,
 ‘artificer‘,
 ‘as‘,
 ‘ascending‘,
 ‘ash‘,
 ‘ashamed‘,
 ‘ask‘,
 ‘asked‘,
 ‘asketh‘,
 ‘ass‘,
 ‘assembly‘,
 ‘asses‘,
 ‘assigned‘,
 ‘asswaged‘,
 ‘at‘,
 ‘attained‘,
 ‘audience‘,
 ‘avenged‘,
 ‘aw‘,
 ‘awaked‘,
 ‘away‘,
 ‘awoke‘,
 ‘back‘,
 ‘backward‘,
 ‘bad‘,
 ‘bade‘,
 ‘badest‘,
 ‘badne‘,
 ‘bak‘,
 ‘bake‘,
 ‘bakemeats‘,
 ‘baker‘,
 ‘bakers‘,
 ‘balm‘,
 ‘bands‘,
 ‘bank‘,
 ‘bare‘,
 ‘barr‘,
 ‘barren‘,
 ‘basket‘,
 ‘baskets‘,
 ‘battle‘,
 ‘bdellium‘,
 ‘be‘,
 ‘bear‘,
 ‘beari‘,
 ‘bearing‘,
 ‘beast‘,
 ‘beasts‘,
 ‘beautiful‘,
 ‘became‘,
 ‘because‘,
 ‘become‘,
 ‘bed‘,
 ‘been‘,
 ‘befall‘,
 ‘befell‘,
 ‘before‘,
 ‘began‘,
 ‘begat‘,
 ‘beget‘,
 ‘begettest‘,
 ‘begin‘,
 ‘beginning‘,
 ‘begotten‘,
 ‘beguiled‘,
 ‘beheld‘,
 ‘behind‘,
 ‘behold‘,
 ‘being‘,
 ‘believed‘,
 ‘belly‘,
 ‘belong‘,
 ‘beneath‘,
 ‘bereaved‘,
 ‘beside‘,
 ‘besides‘,
 ‘besought‘,
 ‘best‘,
 ‘betimes‘,
 ‘better‘,
 ‘between‘,
 ‘betwixt‘,
 ‘beyond‘,
 ‘binding‘,
 ‘bird‘,
 ‘birds‘,
 ‘birthday‘,
 ‘birthright‘,
 ‘biteth‘,
 ‘bitter‘,
 ‘blame‘,
 ‘blameless‘,
 ‘blasted‘,
 ‘bless‘,
 ‘blessed‘,
 ‘blesseth‘,
 ‘blessi‘,
 ‘blessing‘,
 ‘blessings‘,
 ‘blindness‘,
 ‘blood‘,
 ‘blossoms‘,
 ‘bodies‘,
 ‘boldly‘,
 ‘bondman‘,
 ‘bondmen‘,
 ‘bondwoman‘,
 ‘bone‘,
 ‘bones‘,
 ‘book‘,
 ‘booths‘,
 ‘border‘,
 ‘borders‘,
 ‘born‘,
 ‘bosom‘,
 ‘both‘,
 ‘bottle‘,
 ‘bou‘,
 ‘boug‘,
 ‘bough‘,
 ‘bought‘,
 ‘bound‘,
 ‘bow‘,
 ‘bowed‘,
 ‘bowels‘,
 ‘bowing‘,
 ‘boys‘,
 ‘bracelets‘,
 ‘branches‘,
 ‘brass‘,
 ‘bre‘,
 ‘breach‘,
 ‘bread‘,
 ‘breadth‘,
 ‘break‘,
 ‘breaketh‘,
 ‘breaking‘,
 ‘breasts‘,
 ‘breath‘,
 ‘breathed‘,
 ‘breed‘,
 ‘brethren‘,
 ‘brick‘,
 ‘brimstone‘,
 ‘bring‘,
 ‘brink‘,
 ‘broken‘,
 ‘brook‘,
 ‘broth‘,
 ‘brother‘,
 ‘brought‘,
 ‘brown‘,
 ‘bruise‘,
 ‘budded‘,
 ‘build‘,
 ‘builded‘,
 ‘built‘,
 ‘bulls‘,
 ‘bundle‘,
 ‘bundles‘,
 ‘burdens‘,
 ‘buried‘,
 ‘burn‘,
 ‘burning‘,
 ‘burnt‘,
 ‘bury‘,
 ‘buryingplace‘,
 ‘business‘,
 ‘but‘,
 ‘butler‘,
 ‘butlers‘,
 ‘butlership‘,
 ‘butter‘,
 ‘buy‘,
 ‘by‘,
 ‘cakes‘,
 ‘calf‘,
 ‘call‘,
 ‘called‘,
 ‘came‘,
 ‘camel‘,
 ‘camels‘,
 ‘camest‘,
 ‘can‘,
 ‘cannot‘,
 ‘canst‘,
 ‘captain‘,
 ‘captive‘,
 ‘captives‘,
 ‘carcases‘,
 ‘carried‘,
 ‘carry‘,
 ‘cast‘,
 ‘castles‘,
 ‘catt‘,
 ‘cattle‘,
 ‘caught‘,
 ‘cause‘,
 ‘caused‘,
 ‘cave‘,
 ‘cease‘,
 ‘ceased‘,
 ‘certain‘,
 ‘certainly‘,
 ‘chain‘,
 ‘chamber‘,
 ‘change‘,
 ‘changed‘,
 ‘changes‘,
 ‘charge‘,
 ‘charged‘,
 ‘chariot‘,
 ‘chariots‘,
 ‘chesnut‘,
 ‘chi‘,
 ‘chief‘,
 ‘child‘,
 ‘childless‘,
 ‘childr‘,
 ‘children‘,
 ‘chode‘,
 ‘choice‘,
 ‘chose‘,
 ‘circumcis‘,
 ‘circumcise‘,
 ‘circumcised‘,
 ‘citi‘,
 ‘cities‘,
 ‘city‘,
 ‘clave‘,
 ‘clean‘,
 ‘clear‘,
 ‘cleave‘,
 ‘clo‘,
 ‘closed‘,
 ‘clothed‘,
 ‘clothes‘,
 ‘cloud‘,
 ‘clusters‘,
 ‘co‘,
 ‘coat‘,
 ‘coats‘,
 ‘coffin‘,
 ‘cold‘,
 ...]
len(set(text3))
2789
len(text3)/len(set(text3))
16.050197203298673
text3.count(‘smote‘)
5
100*text4.count(‘a‘)/len(text4)
1.4643016433938312
def lexical_diversity(text):
    # lexical英[‘leks?k(?)l] 美 [‘l?ks?kl]
    # adj.词汇的;[语] 词典的;词典编纂的
    # diversity英[da?‘v??s?t?; d?-]美 [d??v?s?ti]
    # n.多样性;差异
    return len(text)/len(set(text))
def percentage(count, total):
    return 100*count/total

print(‘text3中词汇多样性指标:{}‘.format(lexical_diversity(text3)))
print(‘text4中单词a占全文的百分比:{}‘.format(percentage(text4.count(‘a‘),len(text4))))
text3中词汇多样性指标:16.050197203298673
text4中单词a占全文的百分比:1.4643016433938312

列表 = Lists

sent1 = [‘Call‘, ‘me‘,‘Ishmael‘,‘.‘]
print(‘打印sent1中的内容:{}‘.format(sent1))
print(‘打印sent1中内容的长度:{}‘.format(len(sent1)))
print(‘sent1中词汇多样性指标:{}‘.format(lexical_diversity(sent1)))
打印sent1中的内容:[‘Call‘, ‘me‘, ‘Ishmael‘, ‘.‘]
打印sent1中内容的长度:4
sent1中词汇多样性指标:1.0
sent1,sent2,sent3,sent4 # 这是内部定义好的列表
([‘Call‘, ‘me‘, ‘Ishmael‘, ‘.‘],
 [‘The‘,
  ‘family‘,
  ‘of‘,
  ‘Dashwood‘,
  ‘had‘,
  ‘long‘,
  ‘been‘,
  ‘settled‘,
  ‘in‘,
  ‘Sussex‘,
  ‘.‘],
 [‘In‘,
  ‘the‘,
  ‘beginning‘,
  ‘God‘,
  ‘created‘,
  ‘the‘,
  ‘heaven‘,
  ‘and‘,
  ‘the‘,
  ‘earth‘,
  ‘.‘],
 [‘Fellow‘,
  ‘-‘,
  ‘Citizens‘,
  ‘of‘,
  ‘the‘,
  ‘Senate‘,
  ‘and‘,
  ‘of‘,
  ‘the‘,
  ‘House‘,
  ‘of‘,
  ‘Representatives‘,
  ‘:‘])
sent4+sent1
[‘Fellow‘,
 ‘-‘,
 ‘Citizens‘,
 ‘of‘,
 ‘the‘,
 ‘Senate‘,
 ‘and‘,
 ‘of‘,
 ‘the‘,
 ‘House‘,
 ‘of‘,
 ‘Representatives‘,
 ‘:‘,
 ‘Call‘,
 ‘me‘,
 ‘Ishmael‘,
 ‘.‘]
sent1.append(‘Some‘)
[‘Call‘, ‘me‘, ‘Ishmael‘, ‘.‘, ‘Some‘, ‘Some‘, ‘Some‘, ‘Some‘]

列表索引

type(text4)
nltk.text.Text
text4[173]
‘awaken‘
text4.index(‘awaken‘)
173
text5[16715:16735]
[‘U86‘,
 ‘thats‘,
 ‘why‘,
 ‘something‘,
 ‘like‘,
 ‘gamefly‘,
 ‘is‘,
 ‘so‘,
 ‘good‘,
 ‘because‘,
 ‘you‘,
 ‘can‘,
 ‘actually‘,
 ‘play‘,
 ‘a‘,
 ‘full‘,
 ‘game‘,
 ‘without‘,
 ‘buying‘,
 ‘it‘]
text6[1600:1625]
[‘We‘,
 "‘",
 ‘re‘,
 ‘an‘,
 ‘anarcho‘,
 ‘-‘,
 ‘syndicalist‘,
 ‘commune‘,
 ‘.‘,
 ‘We‘,
 ‘take‘,
 ‘it‘,
 ‘in‘,
 ‘turns‘,
 ‘to‘,
 ‘act‘,
 ‘as‘,
 ‘a‘,
 ‘sort‘,
 ‘of‘,
 ‘executive‘,
 ‘officer‘,
 ‘for‘,
 ‘the‘,
 ‘week‘]

变量

sent1 = [‘Call‘,‘me‘,‘Ishmael‘,‘.‘]
my_sent = [‘Bravely‘,‘bold‘,‘Sir‘,‘Robin‘,‘,‘,‘rode‘,‘forth‘,‘from‘,‘Camelot‘,‘.‘]
noun_phrase = my_sent[1:4]
print(‘打印切片后的列表:noun_phrase-》{}‘.format(noun_phrase))
wOrDs = sorted(noun_phrase)
print(‘打印排序后的列表:wOrDs-》{}‘.format(wOrDs))
打印切片后的列表:noun_phrase-》[‘bold‘, ‘Sir‘, ‘Robin‘]
打印排序后的列表:wOrDs-》[‘Robin‘, ‘Sir‘, ‘bold‘]

字符串

name = ‘bright‘
print(‘打印name中的第一个字母:{}‘.format(name[0]))
print(name[:4])
print(name*2)
print(name + ‘!‘)
打印name中的第一个字母:b
brig
brightbright
bright!
‘ ‘.join([‘Monty‘, ‘Python‘])
‘Monty Python‘
‘Monty Python‘.split()
[‘Monty‘, ‘Python‘]
saying = [‘After‘,‘all‘,‘is‘,‘said‘,‘and‘,‘done‘,‘more‘,‘is‘,‘said‘,‘than‘,‘done‘]
tokens = set(saying)
tokens = sorted(tokens)
tokens[-2:]
[‘said‘, ‘than‘]
fdist1 = FreqDist(text1)
vocabulary1 = fdist1.keys()
type(vocabulary1)
dict_keys
fdist1.plot(50, cumulative=True)
#Cumulative frequency plot for the 50 most frequently used words in Moby Dick, which
#account for nearly half of the tokens.

技术分享图片

fdist1.hapaxes() #the words that occur once only
[‘Herman‘,
 ‘Melville‘,
 ‘]‘,
 ‘ETYMOLOGY‘,
 ‘Late‘,
 ‘Consumptive‘,
 ‘School‘,
 ‘threadbare‘,
 ‘lexicons‘,
 ‘mockingly‘,
 ‘flags‘,
 ‘mortality‘,
 ‘signification‘,
 ‘HACKLUYT‘,
 ‘Sw‘,
 ‘HVAL‘,
 ‘roundness‘,
 ‘Dut‘,
 ‘Ger‘,
 ‘WALLEN‘,
 ‘WALW‘,
 ‘IAN‘,
 ‘RICHARDSON‘,
 ‘KETOS‘,
 ‘GREEK‘,
 ‘CETUS‘,
 ‘LATIN‘,
 ‘WHOEL‘,
 ‘ANGLO‘,
 ‘SAXON‘,
 ‘WAL‘,
 ‘HWAL‘,
 ‘SWEDISH‘,
 ‘ICELANDIC‘,
 ‘BALEINE‘,
 ‘BALLENA‘,
 ‘FEGEE‘,
 ‘ERROMANGOAN‘,
 ‘Librarian‘,
 ‘painstaking‘,
 ‘burrower‘,
 ‘grub‘,
 ‘Vaticans‘,
 ‘stalls‘,
 ‘higgledy‘,
 ‘piggledy‘,
 ‘gospel‘,
 ‘promiscuously‘,
 ‘commentator‘,
 ‘belongest‘,
 ‘sallow‘,
 ‘Pale‘,
 ‘Sherry‘,
 ‘loves‘,
 ‘bluntly‘,
 ‘Subs‘,
 ‘thankless‘,
 ‘Hampton‘,
 ‘Court‘,
 ‘hie‘,
 ‘refugees‘,
 ‘pampered‘,
 ‘Michael‘,
 ‘Raphael‘,
 ‘unsplinterable‘,
 ‘GENESIS‘,
 ‘JOB‘,
 ‘JONAH‘,
 ‘punish‘,
 ‘ISAIAH‘,
 ‘soever‘,
 ‘cometh‘,
 ‘incontinently‘,
 ‘perisheth‘,
 ‘PLUTARCH‘,
 ‘MORALS‘,
 ‘breedeth‘,
 ‘Whirlpooles‘,
 ‘Balaene‘,
 ‘arpens‘,
 ‘PLINY‘,
 ‘Scarcely‘,
 ‘TOOKE‘,
 ‘LUCIAN‘,
 ‘TRUE‘,
 ‘catched‘,
 ‘OCTHER‘,
 ‘VERBAL‘,
 ‘TAKEN‘,
 ‘MOUTH‘,
 ‘ALFRED‘,
 ‘890‘,
 ‘gudgeon‘,
 ‘retires‘,
 ‘MONTAIGNE‘,
 ‘APOLOGY‘,
 ‘RAIMOND‘,
 ‘SEBOND‘,
 ‘Nick‘,
 ‘RABELAIS‘,
 ‘cartloads‘,
 ‘STOWE‘,
 ‘ANNALS‘,
 ‘LORD‘,
 ‘BACON‘,
 ‘Touching‘,
 ‘ork‘,
 ‘DEATH‘,
 ‘sovereignest‘,
 ‘bruise‘,
 ‘HAMLET‘,
 ‘leach‘,
 ‘Mote‘,
 ‘availle‘,
 ‘returne‘,
 ‘againe‘,
 ‘worker‘,
 ‘Dinting‘,
 ‘paine‘,
 ‘thro‘,
 ‘maine‘,
 ‘FAERIE‘,
 ‘Immense‘,
 ‘til‘,
 ‘DAVENANT‘,
 ‘PREFACE‘,
 ‘GONDIBERT‘,
 ‘spermacetti‘,
 ‘Hosmannus‘,
 ‘Nescio‘,
 ‘VIDE‘,
 ‘Spencer‘,
 ‘Talus‘,
 ‘flail‘,
 ‘threatens‘,
 ‘jav‘,
 ‘lins‘,
 ‘WALLER‘,
 ‘SUMMER‘,
 ‘ISLANDS‘,
 ‘Commonwealth‘,
 ‘Civitas‘,
 ‘OPENING‘,
 ‘SENTENCE‘,
 ‘HOBBES‘,
 ‘LEVIATHAN‘,
 ‘Silly‘,
 ‘Mansoul‘,
 ‘chewing‘,
 ‘sprat‘,
 ‘PILGRIM‘,
 ‘PROGRESS‘,
 ‘Created‘,
 ‘PARADISE‘,
 ‘LOST‘,
 ‘---"‘,
 ‘Hugest‘,
 ‘Stretched‘,
 ‘Draws‘,
 ‘FULLLER‘,
 ‘PROFANE‘,
 ‘HOLY‘,
 ‘STATE‘,
 ‘DRYDEN‘,
 ‘ANNUS‘,
 ‘MIRABILIS‘,
 ‘aground‘,
 ‘EDGE‘,
 ‘TEN‘,
 ‘SPITZBERGEN‘,
 ‘PURCHAS‘,
 ‘wantonness‘,
 ‘fuzzing‘,
 ‘vents‘,
 ‘HERBERT‘,
 ‘INTO‘,
 ‘ASIA‘,
 ‘AFRICA‘,
 ‘SCHOUTEN‘,
 ‘SIXTH‘,
 ‘CIRCUMNAVIGATION‘,
 ‘Elbe‘,
 ‘ducat‘,
 ‘herrings‘,
 ‘GREENLAND‘,
 ‘Several‘,
 ‘Fife‘,
 ‘Anno‘,
 ‘1652‘,
 ‘Pitferren‘,
 ‘SIBBALD‘,
 ‘FIFE‘,
 ‘KINROSS‘,
 ‘Myself‘,
 ‘Sperma‘,
 ‘ceti‘,
 ‘fierceness‘,
 ‘RICHARD‘,
 ‘STRAFFORD‘,
 ‘LETTER‘,
 ‘BERMUDAS‘,
 ‘PHIL‘,
 ‘TRANS‘,
 ‘1668‘,
 ‘PRIMER‘,
 ‘COWLEY‘,
 ‘1729‘,
 ‘"...‘,
 ‘frequendy‘,
 ‘insupportable‘,
 ‘disorder‘,
 ‘ULLOA‘,
 ‘SOUTH‘,
 ‘AMERICA‘,
 ‘sylphs‘,
 ‘petticoat‘,
 ‘Oft‘,
 ‘Tho‘,
 ‘RAPE‘,
 ‘LOCK‘,
 ‘NAT‘,
 ‘wales‘,
 ‘JOHNSON‘,
 ‘COOK‘,
 ‘dung‘,
 ‘lime‘,
 ‘juniper‘,
 ‘UNO‘,
 ‘VON‘,
 ‘TROIL‘,
 ‘LETTERS‘,
 ‘BANKS‘,
 ‘SOLANDER‘,
 ‘1772‘,
 ‘Nantuckois‘,
 ‘JEFFERSON‘,
 ‘MEMORIAL‘,
 ‘MINISTER‘,
 ‘REFERENCE‘,
 ‘PARLIAMENT‘,
 ‘SOMEWHERE‘,
 ‘guarding‘,
 ‘protecting‘,
 ‘robbers‘,
 ‘BLACKSTONE‘,
 ‘Rodmond‘,
 ‘suspends‘,
 ‘attends‘,
 ‘FALCONER‘,
 ‘Bright‘,
 ‘roofs‘,
 ‘domes‘,
 ‘rockets‘,
 ‘Around‘,
 ‘unwieldy‘,
 ‘COWPER‘,
 ‘VISIT‘,
 ‘LONDON‘,
 ‘HUNTER‘,
 ‘DISSECTION‘,
 ‘SMALL‘,
 ‘SIZED‘,
 ‘aorta‘,
 ‘gushing‘,
 ‘PALEY‘,
 ‘THEOLOGY‘,
 ‘mammiferous‘,
 ‘hind‘,
 ‘BARON‘,
 ‘CUVIER‘,
 ‘COLNETT‘,
 ‘PURPOSE‘,
 ‘EXTENDING‘,
 ‘SPERMACETI‘,
 ‘Floundered‘,
 ‘chace‘,
 ‘peopling‘,
 ‘Gather‘,
 ‘Led‘,
 ‘instincts‘,
 ‘trackless‘,
 ‘Assaulted‘,
 ‘voracious‘,
 ‘spiral‘,
 ‘MONTGOMERY‘,
 ‘WORLD‘,
 ‘FLOOD‘,
 ‘Paean‘,
 ‘fatter‘,
 ‘Flounders‘,
 ‘CHARLES‘,
 ‘LAMB‘,
 ‘TRIUMPH‘,
 ‘1690‘,
 ‘OBED‘,
 ‘Susan‘,
 ‘HAWTHORNE‘,
 ‘TWICE‘,
 ‘bespeak‘,
 ‘raal‘,
 ‘COOPER‘,
 ‘PILOT‘,
 ‘Berlin‘,
 ‘Gazette‘,
 ‘ECKERMANN‘,
 ‘CONVERSATIONS‘,
 ‘GOETHE‘,
 ‘ESSEX‘,
 ‘WAS‘,
 ‘ATTACKED‘,
 ‘FINALLY‘,
 ‘DESTROYED‘,
 ‘OWEN‘,
 ‘CHACE‘,
 ‘FIRST‘,
 ‘SAID‘,
 ‘VESSEL‘,
 ‘YORK‘,
 ‘1821‘,
 ‘piping‘,
 ‘dimmed‘,
 ‘phospher‘,
 ‘ELIZABETH‘,
 ‘OAKES‘,
 ‘SMITH‘,
 ‘amounted‘,
 ‘440‘,
 ‘SCORESBY‘,
 ‘Mad‘,
 ‘agonies‘,
 ‘endures‘,
 ‘infuriated‘,
 ‘rears‘,
 ‘snaps‘,
 ‘propelled‘,
 ‘observers‘,
 ‘opportunities‘,
 ‘habitudes‘,
 ‘BEALE‘,
 ‘offensively‘,
 ‘artful‘,
 ‘mischievous‘,
 ‘FREDERICK‘,
 ‘DEBELL‘,
 ‘1840‘,
 ‘October‘,
 ‘Raise‘,
 ‘ay‘,
 ‘THAR‘,
 ‘bowes‘,
 ‘os‘,
 ‘ROSS‘,
 ‘ETCHINGS‘,
 ‘CRUIZE‘,
 ‘1846‘,
 ‘Globe‘,
 ‘transactions‘,
 ‘relate‘,
 ‘HUSSEY‘,
 ‘SURVIVORS‘,
 ‘parried‘,
 ‘MISSIONARY‘,
 ‘JOURNAL‘,
 ‘TYERMAN‘,
 ‘boldest‘,
 ‘persevering‘,
 ‘REPORT‘,
 ‘DANIEL‘,
 ‘SPEECH‘,
 ‘SENATE‘,
 ‘APPLICATION‘,
 ‘ERECTION‘,
 ‘BREAKWATER‘,
 ‘CAPTORS‘,
 ‘WHALEMAN‘,
 ‘ADVENTURES‘,
 ‘BIOGRAPHY‘,
 ‘GATHERED‘,
 ‘HOMEWARD‘,
 ‘COMMODORE‘,
 ‘PREBLE‘,
 ‘REV‘,
 ‘CHEEVER‘,
 ‘MUTINEER‘,
 ‘BROTHER‘,
 ‘ANOTHER‘,
 ‘MCCULLOCH‘,
 ‘COMMERCIAL‘,
 ‘reciprocal‘,
 ‘clews‘,
 ‘SOMETHING‘,
 ‘UNPUBLISHED‘,
 ‘CURRENTS‘,
 ‘Pedestrians‘,
 ‘recollect‘,
 ‘gateways‘,
 ‘VOYAGER‘,
 ‘ARCTIC‘,
 ‘NEWSPAPER‘,
 ‘TAKING‘,
 ‘RETAKING‘,
 ‘HOBOMACK‘,
 ‘MIRIAM‘,
 ‘FISHERMAN‘,
 ‘appliance‘,
 ‘RIBS‘,
 ‘TRUCKS‘,
 ‘Terra‘,
 ‘Del‘,
 ‘Fuego‘,
 ‘DARWIN‘,
 ‘NATURALIST‘,
 ";--‘",
 ‘!\‘"‘,
 ‘WHARTON‘,
 ‘Loomings‘,
 ‘spleen‘,
 ‘regulating‘,
 ‘circulation‘,
 ‘Whenever‘,
 ‘drizzly‘,
 ‘hypos‘,
 ‘philosophical‘,
 ‘Cato‘,
 ‘Manhattoes‘,
 ‘reefs‘,
 ‘downtown‘,
 ‘gazers‘,
 ‘Circumambulate‘,
 ‘Corlears‘,
 ‘Coenties‘,
 ‘Slip‘,
 ‘Whitehall‘,
 ‘Posted‘,
 ‘sentinels‘,
 ‘spiles‘,
 ‘pier‘,
 ‘lath‘,
 ‘counters‘,
 ‘desks‘,
 ‘loitering‘,
 ‘shady‘,
 ‘Inlanders‘,
 ‘lanes‘,
 ‘alleys‘,
 ‘attract‘,
 ‘dale‘,
 ‘dreamiest‘,
 ‘shadiest‘,
 ‘quietest‘,
 ‘enchanting‘,
 ‘Saco‘,
 ‘crucifix‘,
 ‘Deep‘,
 ‘mazy‘,
 ‘Tiger‘,
 ‘Tennessee‘,
 ‘Rockaway‘,
 ‘Persians‘,
 ‘deity‘,
 ‘Narcissus‘,
 ‘ungraspable‘,
 ‘hazy‘,
 ‘quarrelsome‘,
 ‘offices‘,
 ‘abominate‘,
 ‘toils‘,
 ‘trials‘,
 ‘barques‘,
 ‘schooners‘,
 ‘broiling‘,
 ‘buttered‘,
 ‘judgmatically‘,
 ‘peppered‘,
 ‘reverentially‘,
 ‘idolatrous‘,
 ‘dotings‘,
 ‘ibis‘,
 ‘roasted‘,
 ‘bake‘,
 ‘plumb‘,
 ‘Van‘,
 ‘Rensselaers‘,
 ‘Randolphs‘,
 ‘Hardicanutes‘,
 ‘lording‘,
 ‘tallest‘,
 ‘decoction‘,
 ‘Seneca‘,
 ‘Stoics‘,
 ‘Testament‘,
 ‘promptly‘,
 ‘rub‘,
 ‘infliction‘,
 ‘BEING‘,
 ‘PAID‘,
 ‘urbane‘,
 ‘ills‘,
 ‘monied‘,
 ‘consign‘,
 ‘prevalent‘,
 ‘violate‘,
 ‘Pythagorean‘,
 ‘commonalty‘,
 ‘police‘,
 ‘surveillance‘,
 ‘programme‘,
 ‘solo‘,
 ‘CONTESTED‘,
 ‘ELECTION‘,
 ‘PRESIDENCY‘,
 ‘UNITED‘,
 ‘STATES‘,
 ‘ISHMAEL‘,
 ‘BLOODY‘,
 ‘AFFGHANISTAN‘,
 ‘managers‘,
 ‘genteel‘,
 ‘comedies‘,
 ‘farces‘,
 ‘cunningly‘,
 ‘disguises‘,
 ‘cajoling‘,
 ‘unbiased‘,
 ‘freewill‘,
 ‘discriminating‘,
 ‘overwhelming‘,
 ‘undeliverable‘,
 ‘itch‘,
 ‘forbidden‘,
 ‘ignoring‘,
 ‘lodges‘,
 ‘Carpet‘,
 ‘Bag‘,
 ‘Manhatto‘,
 ‘candidates‘,
 ‘penalties‘,
 ‘Tyre‘,
 ‘Carthage‘,
 ‘imported‘,
 ‘cobblestones‘,
 ‘bitingly‘,
 ‘shouldering‘,
 ‘price‘,
 ‘fervent‘,
 ‘asphaltic‘,
 ‘pavement‘,
 ‘flinty‘,
 ‘projections‘,
 ‘soles‘,
 ‘Too‘,
 ‘cheapest‘,
 ‘cheeriest‘,
 ‘invitingly‘,
 ‘particles‘,
 ‘peer‘,
 ‘Angel‘,
 ‘Doom‘,
 ‘wailing‘,
 ‘gnashing‘,
 ‘Wretched‘,
 ‘entertainment‘,
 ‘Moving‘,
 ‘emigrant‘,
 ‘poverty‘,
 ‘creak‘,
 ‘lodgings‘,
 ‘zephyr‘,
 ‘hob‘,
 ‘toasting‘,
 ‘observest‘,
 ‘sashless‘,
 ‘glazier‘,
 ‘reasonest‘,
 ‘chinks‘,
 ‘crannies‘,
 ‘lint‘,
 ‘chattering‘,
 ‘shiverings‘,
 ‘cob‘,
 ‘redder‘,
 ‘Orion‘,
 ‘glitters‘,
 ‘conservatories‘,
 ‘president‘,
 ‘temperance‘,
 ‘blubbering‘,
 ‘straggling‘,
 ‘wainscots‘,
 ‘reminding‘,
 ‘oilpainting‘,
 ‘besmoked‘,
 ‘defaced‘,
 ‘unequal‘,
 ‘crosslights‘,
 ‘hags‘,
 ‘delineate‘,
 ‘bewitched‘,
 ‘ponderings‘,
 ‘boggy‘,
 ‘soggy‘,
 ‘squitchy‘,
 ‘froze‘,
 ‘heath‘,
 ‘icebound‘,
 ‘represents‘,
 ‘Horner‘,
 ‘foundered‘,
 ‘clubs‘,
 ‘harvesting‘,
 ‘hacking‘,
 ‘horrifying‘,
 ‘Mixed‘,
 ‘Nathan‘,
 ‘Swain‘,
 ‘corkscrew‘,
 ‘Blanco‘,
 ‘sojourning‘,
 ‘fireplaces‘,
 ‘duskier‘,
 ‘cockpits‘,
 ‘rarities‘,
 ‘Projecting‘,
 ‘Within‘,
 ‘shelves‘,
 ‘flasks‘,
 ‘bustles‘,
 ‘deliriums‘,
 ‘Abominable‘,
 ‘tumblers‘,
 ‘cylinders‘,
 ‘goggling‘,
 ‘deceitfully‘,
 ‘tapered‘,
 ‘Parallel‘,
 ‘pecked‘,
 ‘footpads‘,
 ‘Fill‘,
 ‘shilling‘,
 ‘examining‘,
 ‘SKRIMSHANDER‘,
 ‘accommodated‘,
 ‘unoccupied‘,
 ‘haint‘,
 ‘pose‘,
 ‘whalin‘,
 ‘decidedly‘,
 ‘objectionable‘,
 ‘wander‘,
 ‘Battery‘,
 ‘ruminating‘,
 ‘adorning‘,
 ‘potatoes‘,
 ‘sartainty‘,
 ‘diabolically‘,
 ‘steaks‘,
 ‘undress‘,
 ‘looker‘,
 ‘rioting‘,
 ‘Grampus‘,
 ‘seed‘,
 ‘Feegees‘,
 ‘tramping‘,
 ‘Enveloped‘,
 ‘bedarned‘,
 ‘eruption‘,
 ‘officiating‘,
 ‘brimmers‘,
 ‘complained‘,
 ‘potion‘,
 ‘colds‘,
 ‘catarrhs‘,
 ‘liquor‘,
 ‘arrantest‘,
 ‘topers‘,
 ‘obstreperously‘,
 ‘aloof‘,
 ‘desirous‘,
 ‘hilarity‘,
 ‘coffer‘,
 ‘Southerner‘,
 ‘mountaineers‘,
 ‘Alleghanian‘,
 ‘missed‘,
 ‘supernaturally‘,
 ‘congratulate‘,
 ‘multiply‘,
 ‘bachelor‘,
 ‘abominated‘,
 ‘tidiest‘,
 ‘bedwards‘,
 ‘shan‘,
 ‘tablecloth‘,
 ‘Skrimshander‘,
 ‘bump‘,
 ‘spraining‘,
 ‘eider‘,
 ‘yoking‘,
 ‘rickety‘,
 ‘whirlwinds‘,
 ‘knockings‘,
 ‘dismissed‘,
 ‘popped‘,
 ‘cherishing‘,
 ‘chuckled‘,
 ‘chuckle‘,
 ‘mightily‘,
 ‘catches‘,
 ‘bamboozingly‘,
 ‘overstocked‘,
 ‘toothpick‘,
 ‘rayther‘,
 ‘BROWN‘,
 ‘slanderin‘,
 ‘farrago‘,
 ‘BROKE‘,
 ‘Sartain‘,
 ‘Mt‘,
 ‘Hecla‘,
 ‘persist‘,
 ‘mystifying‘,
 ‘unsay‘,
 ‘criminal‘,
 ‘Wall‘,
 ‘purty‘,
 ‘sarmon‘,
 ‘rips‘,
 ‘tellin‘,
 ‘bought‘,
 ‘balmed‘,
 ‘curios‘,
 ‘sellin‘,
 ‘inions‘,
 ‘fooling‘,
 ‘idolators‘,
 ‘Depend‘,
 ‘reg‘,
 ‘lar‘,
 ‘spliced‘,
 ‘Johnny‘,
 ‘sprawling‘,
 ‘Arter‘,
 ‘glim‘,
 ‘jiffy‘,
 ‘irresolute‘,
 ‘vum‘,
 ‘WON‘,
 ‘Folding‘,
 ‘scrutiny‘,
 ‘porcupine‘,
 ‘moccasin‘,
 ‘ponchos‘,
 ‘parade‘,
 ‘rainy‘,
 ‘remembering‘,
 ‘commended‘,
 ‘cobs‘,
 ‘Nod‘,
 ‘footfall‘,
 ‘unlacing‘,
 ‘blackish‘,
 ‘plasters‘,
 ‘inkling‘,
 ‘Placing‘,
 ‘crammed‘,
 ‘scalp‘,
 ‘mildewed‘,
 ‘Ignorance‘,
 ‘parent‘,
 ‘nonplussed‘,
 ‘undressing‘,
 ‘checkered‘,
 ‘Thirty‘,
 ‘frogs‘,
 ‘quaked‘,
 ‘wrapall‘,
 ‘dreadnaught‘,
 ‘fumbled‘,
 ‘Remembering‘,
 ‘manikin‘,
 ‘tenpin‘,
 ‘andirons‘,
 ‘jambs‘,
 ‘bricks‘,
 ‘appropriate‘,
 ‘applying‘,
 ‘hastier‘,
 ‘withdrawals‘,
 ‘antics‘,
 ‘devotee‘,
 ‘extinguishing‘,
 ‘unceremoniously‘,
 ‘bagged‘,
 ‘sportsman‘,
 ‘woodcock‘,
 ‘uncomfortableness‘,
 ‘deliberating‘,
 ‘puffed‘,
 ‘sang‘,
 ‘Stammering‘,
 ‘conjured‘,
 ‘responses‘,
 ‘debel‘,
 ‘flourishing‘,
 ‘Angels‘,
 ‘flourishings‘,
 ‘peddlin‘,
 ‘sleepe‘,
 ‘grunted‘,
 ‘gettee‘,
 ‘motioning‘,
 ‘comely‘,
 ‘insured‘,
 ‘Counterpane‘,
 ‘parti‘,
 ‘triangles‘,
 ‘interminable‘,
 ‘caper‘,
 ‘supperless‘,
 ‘21st‘,
 ‘hemisphere‘,
 ‘sigh‘,
 ‘Sixteen‘,
 ‘ached‘,
 ‘coaches‘,
 ‘stockinged‘,
 ‘slippering‘,
 ‘misbehaviour‘,
 ‘unendurable‘,
 ‘stepmothers‘,
 ‘misfortunes‘,
 ‘steeped‘,
 ‘shudderingly‘,
 ‘confounding‘,
 ‘soberly‘,
 ‘recurred‘,
 ‘predicament‘,
 ‘unlock‘,
 ‘bridegroom‘,
 ‘clasp‘,
 ‘hugged‘,
 ‘rouse‘,
 ‘snore‘,
 ‘scratch‘,
 ‘Throwing‘,
 ‘expostulations‘,
 ‘unbecomingness‘,
 ‘matrimonial‘,
 ‘dawning‘,
 ‘overture‘,
 ‘innate‘,
 ‘compliment‘,
 ‘civility‘,
 ‘rudeness‘,
 ‘toilette‘,
 ‘dressing‘,
 ‘donning‘,
 ‘gaspings‘,
 ‘booting‘,
 ‘caterpillar‘,
 ‘outlandishness‘,
 ‘manners‘,
 ‘education‘,
 ‘undergraduate‘,
 ‘dreamt‘,
 ‘cowhide‘,
 ‘pinched‘,
 ‘curtains‘,
 ‘indecorous‘,
 ‘contented‘,
 ‘restricting‘,
 ‘donned‘,
 ‘lathering‘,
 ‘unsheathes‘,
 ‘whets‘,
 ‘Rogers‘,
 ‘cutlery‘,
 ‘Afterwards‘,
 ‘baton‘,
 ‘Breakfast‘,
 ‘pleasantly‘,
 ‘bountifully‘,
 ‘laughable‘,
 ‘bosky‘,
 ‘unshorn‘,
 ‘gowns‘,
 ‘toasted‘,
 ‘lingers‘,
 ‘tarried‘,
 ‘barred‘,
 ‘Grub‘,
 ‘Park‘,
 ‘assurance‘,
 ‘polish‘,
 ‘occasioned‘,
 ‘embarrassed‘,
 ‘bashfulness‘,
 ‘duelled‘,
 ‘winking‘,
 ‘tastes‘,
 ‘sheepishly‘,
 ‘bashful‘,
 ‘icicle‘,
 ‘admirer‘,
 ‘cordially‘,
 ‘grappling‘,
 ‘genteelly‘,
 ‘eschewed‘,
 ‘undivided‘,
 ‘6‘,
 ‘circulating‘,
 ‘nondescripts‘,
 ‘Chestnut‘,
 ‘jostle‘,
 ‘Regent‘,
 ‘Lascars‘,
 ‘Bombay‘,
 ‘Apollo‘,
 ‘Feegeeans‘,
 ‘Tongatobooarrs‘,
 ‘Erromanggoans‘,
 ‘Pannangians‘,
 ‘Brighggians‘,
 ‘weekly‘,
 ‘Vermonters‘,
 ‘stalwart‘,
 ‘frames‘,
 ‘felled‘,
 ‘strutting‘,
 ‘wester‘,
 ‘bombazine‘,
 ‘cloak‘,
 ‘mow‘,
 ‘gloves‘,
 ‘joins‘,
 ‘outfit‘,
 ‘waistcoats‘,
 ‘Hay‘,
 ‘Seed‘,
 ‘tract‘,
 ‘dearest‘,
 ‘pave‘,
 ‘eggs‘,
 ‘patrician‘,
 ‘parks‘,
 ‘scraggy‘,
 ‘scoria‘,
 ‘Herr‘,
 ‘dowers‘,
 ‘nieces‘,
 ‘reservoirs‘,
 ‘maples‘,
 ‘bountiful‘,
 ‘proffer‘,
 ‘passer‘,
 ‘cones‘,
 ‘blossoms‘,
 ‘superinduced‘,
 ‘carnation‘,
 ‘Salem‘,
 ‘sweethearts‘,
 ‘Puritanic‘,
 ‘Whaleman‘,
 ‘Wrapping‘,
 ‘Each‘,
 ‘quote‘,
 ‘TALBOT‘,
 ‘Near‘,
 ‘Desolation‘,
 ‘1st‘,
 ‘SISTER‘,
 ‘ROBERT‘,
 ‘WILLIS‘,
 ‘ELLERY‘,
 ‘NATHAN‘,
 ‘COLEMAN‘,
 ‘WALTER‘,
 ‘CANNY‘,
 ‘SETH‘,
 ‘GLEIG‘,
 ‘Forming‘,
 ‘ELIZA‘,
 ‘31st‘,
 ‘MARBLE‘,
 ‘SHIPMATES‘,
 ‘EZEKIEL‘,
 ‘HARDY‘,
 ‘AUGUST‘,
 ‘3d‘,
 ‘1833‘,
 ‘WIDOW‘,
 ‘Shaking‘,
 ‘glazed‘,
 ‘Affected‘,
 ‘relatives‘,
 ‘unhealing‘,
 ‘sympathetically‘,
 ‘wounds‘,
 ‘bleed‘,
 ‘blanks‘,
 ...]

单词的精细选择

  1. the set of all w such that w is an element of V (the vocabulary) and w has property P
    {w|w \(\in\) V and P(w)}
  2. The corresponding Python expression is given:
    [w for w in V if p(w)]
V = set(text1)
long_words = [w for w in V if len(w)>15]
sorted(long_words)
[‘CIRCUMNAVIGATION‘,
 ‘Physiognomically‘,
 ‘apprehensiveness‘,
 ‘cannibalistically‘,
 ‘characteristically‘,
 ‘circumnavigating‘,
 ‘circumnavigation‘,
 ‘circumnavigations‘,
 ‘comprehensiveness‘,
 ‘hermaphroditical‘,
 ‘indiscriminately‘,
 ‘indispensableness‘,
 ‘irresistibleness‘,
 ‘physiognomically‘,
 ‘preternaturalness‘,
 ‘responsibilities‘,
 ‘simultaneousness‘,
 ‘subterraneousness‘,
 ‘supernaturalness‘,
 ‘superstitiousness‘,
 ‘uncomfortableness‘,
 ‘uncompromisedness‘,
 ‘undiscriminating‘,
 ‘uninterpenetratingly‘]

本文选自《Natural Language Processing with Python》

Python3NLTK-自然语言处理

标签:pru   bre   odm   mib   grub   amr   UI   lib   lame   

原文地址:https://www.cnblogs.com/brightyuxl/p/8973951.html

(0)
(0)
   
举报
评论 一句话评论(0
登录后才能评论!
© 2014 mamicode.com 版权所有  联系我们:gaon5@hotmail.com
迷上了代码!