标签:form intel 机器 朴素贝叶斯 vector rap ace article 大量
在考虑一个结果的概率时候,要考虑众多的属性,贝叶斯算法利用所有可能的数据来进行修正预测,如果大量的特征产生的影响较小,放在一起,组合的影响较大,适合于朴素贝叶斯分类
1 from sklearn.datasets import fetch_20newsgroups 2 from sklearn.feature_extraction.text import TfidfVectorizer 3 from sklearn.naive_bayes import MultinomialNB
1 def article_category(): 2 categories = [‘alt.atheism‘, ‘soc.religion.christian‘, ‘comp.graphics‘, ‘sci.med‘] 3 twenty_train = fetch_20newsgroups(subset=‘train‘, categories=categories) 4 # print(twenty_train) 5 print(twenty_train.data) 6 print(twenty_train.target) 7 # 将x训练集词频向量化 8 tfidf_transformer = TfidfVectorizer() 9 X_train_tfidf = tfidf_transformer.fit_transform(twenty_train.data) 10 11 # 贝叶斯训练 12 clf = MultinomialNB(alpha=1.0).fit(X_train_tfidf, twenty_train.target) 13 docs_new = [‘Chemical reaction‘, ‘Intel CPU is good‘] 14 15 # 将要预测的数据词频向量化 16 X_new_tfidf = tfidf_transformer.transform(docs_new) 17 # 预测 18 predicted = clf.predict(X_new_tfidf) 19 print(predicted) 20 for doc, category in zip(docs_new, predicted): 21 print(‘%r => %s‘ % (doc, twenty_train.target_names[category]))
标签:form intel 机器 朴素贝叶斯 vector rap ace article 大量
原文地址:https://www.cnblogs.com/siplips/p/9757642.html