线性回归的Spark实现 [Linear Regression / Machine Learning / Spark]

时间：2015-05-13 18:48:31 阅读：160 评论：0 收藏：0 [点我收藏+]

标签：

1- 问题提出

2- 线性回归

3- 理论推导

4- Python/Spark实现

 1 # -*- coding: utf-8 -*-
 2 from pyspark import SparkContext
 3 
 4 
 5 theta = [0, 0]
 6 alpha = 0.001
 7 
 8 sc = SparkContext(‘local‘)
 9 
10 def func_theta_x(x):
11     return sum([i * j for i, j in zip(theta, x)])
12 
13 def cost(x):
14     thx = func_theta_x(x)
15     return thx - x[-1]
16 
17 def partial_theta(x):
18     dif = cost(x)
19     return [dif * i for i in x[:-1]]
20 
21 rdd = sc.textFile(‘/home/freyr/linearRegression.txt‘)22         .map(lambda line: map(float, line.strip().split(‘\t‘)))
23 
24 maxiter = 400
25 iter = 0
26 while True:
27     parTheta = rdd.map(partial_theta)28                   .reduce(lambda x, y: [i + j for i, j in zip(x, y)])
29 
30     for i in range(2):
31         theta[i] = theta[i] - alpha * parTheta[i]
32 
33     iter += 1
34 
35     if iter <= maxiter:
36         if sum(map(abs, parTheta)) <= 0.01:
37             print ‘I get it!!!‘
38             print ‘Iter = %s‘ % iter
39             print ‘Theta = %s‘ % theta
40             break
41     else:
42         print ‘Failed...‘
43         break

PS: 1. linearRegression.txt

线性回归的Spark实现 [Linear Regression / Machine Learning / Spark]

标签：

原文地址：http://www.cnblogs.com/freyr/p/4501100.html

踩

(0)

评论一句话评论（0）

分享档案

更多>

2021年07月29日 (22)
2021年07月28日 (40)
2021年07月27日 (32)
2021年07月26日 (79)
2021年07月23日 (29)
2021年07月22日 (30)
2021年07月21日 (42)
2021年07月20日 (16)
2021年07月19日 (90)
2021年07月16日 (35)

周排行