码迷,mamicode.com
首页 > 其他好文 > 详细

朴素贝叶斯

时间:2018-01-21 00:05:38      阅读:160      评论:0      收藏:0      [点我收藏+]

标签:mat   document   stc   garbage   ati   没有   dsv   一个   判断   

#朴素:考虑每个特征或者词,出项的可能性与它和其他单词相邻没有关系
#每个特征等权重
from numpy import *

def loadDataSet():
    postingList=[[my, dog, has, flea, problems, help, please],
                 [maybe, not, take, him, to, dog, park, stupid],
                 [my, dalmation, is, so, cute, I, love, him],
                 [stop, posting, stupid, worthless, garbage],
                 [mr, licks, ate, my, steak, how, to, stop, him],
                 [quit, buying, worthless, dog, food, stupid]]
    classVec = [0,1,0,1,0,1]    #1 is abusive, 0 not
    return postingList,classVec
#创建一个单词的集合
def createVocabList(dataSet):
    vocabSet = set([]) #创建空集合
    for document in dataSet:
        vocabSet |= set(document)
    return list(vocabSet)

#判断文档出现在词汇表中
def setOfWordsVec(vocabSet,inputSet):
    returnVec = [0]*len(vocabSet)
    for word in inputSet:
        if word in vocabSet:
            returnVec[vocabSet.index(word)] = 1
        else: print ("the word: %s is not in the Vocabulary!" % word)
    return returnVec

def main():
    listOPosts,listClasses = loadDataSet()
    myVocabList = createVocabList(listOPosts)
    print (myVocabList)
    print(setOfWordsVec(myVocabList, listOPosts[0]))
main()

 

朴素贝叶斯

标签:mat   document   stc   garbage   ati   没有   dsv   一个   判断   

原文地址:https://www.cnblogs.com/littlepear/p/8322251.html

(0)
(0)
   
举报
评论 一句话评论(0
登录后才能评论!
© 2014 mamicode.com 版权所有  联系我们:gaon5@hotmail.com
迷上了代码!