【Spark MLlib速成宝典】模型篇04朴素贝叶斯【Naive Bayes】（Python版）

时间：2017-12-11 16:45:35 阅读：271 评论：0 收藏：0 [点我收藏+]

标签：pre mllib ret text 贝叶斯 with erro save square

# -*-coding=utf-8 -*-  
from pyspark import SparkConf, SparkContext
sc = SparkContext(‘local‘)

from pyspark.mllib.regression import LabeledPoint, LinearRegressionWithSGD, LinearRegressionModel

# Load and parse the data 加载和解析数据，将每一个数转化为浮点数。每一行第一个数作为标记，后面的作为特征
def parsePoint(line):
    values = [float(x) for x in line.replace(‘,‘, ‘ ‘).split(‘ ‘)]
    return LabeledPoint(values[0], values[1:])

data = sc.textFile("data/mllib/ridge-data/lpsa.data")
print data.collect()[0] #-0.4307829,-1.63735562648104 -2.00621178480549 -1.86242597251066 -1.024....-0.864466507337306
parsedData = data.map(parsePoint)
print parsedData.collect()[0] #(-0.4307829,[-1.63735562648,-2.00621178481,-1.86242597251,-1.024....,-0.864466507337])

# Build the model 建立模型
model = LinearRegressionWithSGD.train(parsedData, iterations=1000, step=0.1)

# Evaluate the model on training data 评估模型在训练集上的误差
valuesAndPreds = parsedData.map(lambda p: (p.label, model.predict(p.features)))
MSE = valuesAndPreds     .map(lambda vp: (vp[0] - vp[1])**2)     .reduce(lambda x, y: x + y) / valuesAndPreds.count()
print("Mean Squared Error = " + str(MSE)) #Mean Squared Error = 6.32693963099

# Save and load model 保存模型和加载模型
model.save(sc, "pythonLinearRegressionWithSGDModel")
sameModel = LinearRegressionModel.load(sc, "pythonLinearRegressionWithSGDModel")
print sameModel.predict(parsedData.collect()[0].features) #-1.86583391312

返回目录

标签：pre mllib ret text 贝叶斯 with erro save square

原文地址：http://www.cnblogs.com/itmorn/p/8023667.html

踩

(0)

评论一句话评论（0）

分享档案

更多>

2021年07月29日 (22)
2021年07月28日 (40)
2021年07月27日 (32)
2021年07月26日 (79)
2021年07月23日 (29)
2021年07月22日 (30)
2021年07月21日 (42)
2021年07月20日 (16)
2021年07月19日 (90)
2021年07月16日 (35)

周排行

【Spark MLlib速成宝典】模型篇04朴素贝叶斯【Naive Bayes】（Python版）

目录

朴素贝叶斯原理

朴素贝叶斯代码(Spark Python)