码迷,mamicode.com
首页 > 编程语言 > 详细

【Spark MLlib速成宝典】模型篇02逻辑斯谛回归【Logistic回归】(Python版)

时间:2017-12-11 16:07:50      阅读:215      评论:0      收藏:0      [点我收藏+]

标签:训练   erro   tab   height   python   style   cat   port   with   

目录

  Logistic回归原理

  Logistic回归代码(Spark Python)


 

Logistic回归原理

   详见博文:http://www.cnblogs.com/itmorn/p/7890468.html

 返回目录

 

Logistic回归代码(Spark Python) 

  代码里数据:https://pan.baidu.com/s/1jHWKG4I 密码:acq1

 

# -*-coding=utf-8 -*-  
from pyspark import SparkConf, SparkContext
sc = SparkContext(local)

from pyspark.mllib.classification import LogisticRegressionWithLBFGS, LogisticRegressionModel
from pyspark.mllib.regression import LabeledPoint

# Load and parse the data 加载和解析数据,将每一个数转化为浮点数。每一行第一个数作为标记,后面的作为特征
def parsePoint(line):
    values = [float(x) for x in line.split( )]
    return LabeledPoint(values[0], values[1:])

data = sc.textFile("data/mllib/sample_svm_data.txt")
print data.collect()[0] #1 0 2.52078447201548 0 0 0 2.004684436494304 2.00034729926846.....
parsedData = data.map(parsePoint)
print parsedData.collect()[0] #(1.0,[0.0,2.52078447202,0.0,0.0,0.0,2.00468....

# Build the model 建立模型
model = LogisticRegressionWithLBFGS.train(parsedData)

# Evaluating the model on training data 评估模型在训练集上的误差
labelsAndPreds = parsedData.map(lambda p: (p.label, model.predict(p.features)))
trainErr = labelsAndPreds.filter(lambda lp: lp[0] != lp[1]).count() / float(parsedData.count())
print("Training Error = " + str(trainErr)) #Training Error = 0.366459627329

# Save and load model 保存模型和加载模型
model.save(sc, "pythonLogisticRegressionWithLBFGSModel")
sameModel = LogisticRegressionModel.load(sc,"pythonLogisticRegressionWithLBFGSModel")

print sameModel.predict(parsedData.collect()[0].features) #1

 

 

 返回目录

 

【Spark MLlib速成宝典】模型篇02逻辑斯谛回归【Logistic回归】(Python版)

标签:训练   erro   tab   height   python   style   cat   port   with   

原文地址:http://www.cnblogs.com/itmorn/p/8023190.html

(0)
(0)
   
举报
评论 一句话评论(0
登录后才能评论!
© 2014 mamicode.com 版权所有  联系我们:gaon5@hotmail.com
迷上了代码!