码迷,mamicode.com
首页 > 编程语言 > 详细

python判断一个单词是否为有效的英文单词?——三种方法

时间:2017-11-02 13:16:01      阅读:788      评论:0      收藏:0      [点我收藏+]

标签:com   cond   begin   googl   bsp   text   file   ati   help   

For (much) more power and flexibility, use a dedicated spellchecking library like PyEnchant. There‘s a tutorial, or you could just dive straight in:

>>> import enchant
>>> d = enchant.Dict("en_US")
>>> d.check("Hello")
True
>>> d.check("Helo")
False
>>> d.suggest("Helo")
[‘He lo‘, ‘He-lo‘, ‘Hello‘, ‘Helot‘, ‘Help‘, ‘Halo‘, ‘Hell‘, ‘Held‘, ‘Helm‘, ‘Hero‘, "He‘ll"]
>>>

PyEnchant comes with a few dictionaries (en_GB, en_US, de_DE, fr_FR), but can use any of the OpenOffice ones if you want more languages.

 

Using a set to store the word list because looking them up will be faster:

with open("english_words.txt") as word_file:
    english_words = set(word.strip().lower() for word in word_file)

def is_english_word(word):
    return word.lower() in english_words

print is_english_word("ham")  # should be true if you have a good english_words.txt

To answer the second part of the question, the plurals would already be in a good word list, but if you wanted to specifically exclude those from the list for some reason, you could indeed write a function to handle it. But English pluralization rules are tricky enough that I‘d just include the plurals in the word list to begin with.

As to where to find English word lists, I found several just by Googling "English word list". Here is one: http://www.sil.org/linguistics/wordlists/english/wordlist/wordsEn.txt You could Google for British or American English if you want specifically one of those dialects.

 

Using NLTK:

from nltk.corpus import wordnet

if not wordnet.synsets(word_to_test):
  #Not an English Word
else:
  #English Word

You should refer to this article if you have trouble installing wordnet or want to try other approaches.

 

摘自:https://stackoverflow.com/questions/3788870/how-to-check-if-a-word-is-an-english-word-with-python

python判断一个单词是否为有效的英文单词?——三种方法

标签:com   cond   begin   googl   bsp   text   file   ati   help   

原文地址:http://www.cnblogs.com/bonelee/p/7771609.html

(0)
(0)
   
举报
评论 一句话评论(0
登录后才能评论!
© 2014 mamicode.com 版权所有  联系我们:gaon5@hotmail.com
迷上了代码!