标签:通过 并且 技术 alt min control bsp park maximum
ComputeSVD
在分布式矩阵有CoordinateMatirx, RowMatrix, IndexedRowMatrix三种。除了CoordinateMatrix之外,IndexedRowMatrix和RowMatrix都有computeSVD方法,并且CoordinateMatrix有toIndexedRowMatrix()方法和toRowMatrix()方法可以向IndexedRowMatrix 和RowMatrix两种矩阵类型转换。
因此主要对比 IndexedRowMatrix 和 RowMatrix 两种矩阵类型的 ComputSVD 算法进行分析
关于SVD内容请参看 维基百科 ,和一篇很棒的博文:《机器学习中的数学》进行了解。
一 算法描述:
def computeSVD ( k: Int, computeU: Boolean = false, rCond: Double = 1e-9):
IndexedRowMatrix 返回类型: SingularValueDecomposition[IndexedRowMatrix, Matrix]
RowMatrix 返回类型: SingularValueDecomposition[RowMatrix, Matrix]
U is a RowMatrix of size m x k that satisfies U‘ * U = eye(k),
S is a Vector of size k, holding the singular values in descending order,
V is a Matrix of size n x k that satisfies V‘ * V = eye(k).
k number of leading singular values to keep (0 < k <= n). It might return less than k if there are
numerically zero singular values or there are not enough Ritz values converged before the
maximum number of Arnoldi update iterations is reached.
computeU whether to compute U
rCoud the reciprocal condition number. All singular values smaller than rCond * sigma(0) are treated as zero,
where sigma(0) is the largest singular value.
return SingularValueDecomposition(U, s, V). U = null if computeU = false.
二 选择例子:
构建一个4×5的矩阵M:
M矩阵的奇异值分解后奇异矩阵s应为:
4 0 0 0 0
0 3 0 0 0
0 0 √5 0 0
0 0 0 0 0
我们将通过ComputeSVD函数进行验证.
三 构造矩阵,运行算法并验证结果:
<一> 构造RowMatrix矩阵:M
scala> val M = new RowMatrix(sc.textFile("hdfs:///usr/matrix/svdM.txt").map(_.split(‘ ‘))
.map(_.map(_.toDouble)).map(_.toArray)
.map(line => Vectors.dense(line)))
M: org.apache.spark.mllib.linalg.distributed.RowMatrix = org.apache.spark.mllib.linalg.distributed.RowMatrix
<二> 调用算法
scala> val svd = M.computeSVD(4, true)
svd: SingularValueDecomposition[RowMatrix,Matrix]
可以看到svd是一个SingularValueDecomposition类型的对像,内部包含一个RowMatrix和一个Matrix用算法,并且此处的RowMatrix就是左奇异向量U,Matrix就是右奇异向量V.
<三> 验证结果
SingularValueDecomposition类API如下:
矩阵M的左奇异向量U:
scala> scala> val U = svd.U
U: org.apache.spark.mllib.linalg.distributed.RowMatrix = org.apache.spark.mllib.linalg.distributed.RowMatrix
scala> U.rows.foreach(println)
[0.0 ,0.0 , -0.9999999999999999 , -1.4901161193847656E-8]
[0.0 ,1.0 ,0.0 ,0.0]
[0.0 ,0.0 ,0.0 ,0.0]
[-1.0 ,0.0 ,0.0 ,0.0]
矩阵M的奇异值s:
scala> val s = svd.s
s: org.apache.spark.mllib.linalg.Vector = [4.0,3.0,2.23606797749979,1.4092648163485167E-8]
矩阵M的右奇异向量V:
scala> val V = svd.V
V: org.apache.spark.mllib.linalg.Matrix =
0.0 0.0 -0.44721359549995787 0.8944271909999159
-1.0 0.0 0.0 0.0
0.0 1.0 0.0 0.0
0.0 0.0 0.0 0.0
0.0 0.0 -0.8944271909999159 -0.447213595499958
标签:通过 并且 技术 alt min control bsp park maximum
原文地址:http://www.cnblogs.com/txq157/p/6028686.html