标签:logs input this parallel nsf result park ted his
The matrix is generated from SVD, and I am using the results from SVD to do clustering analysis.
if your clustering only supports RDD as its input, here‘s how you can do the transformation
def toRDD(sc :SparkContext,m: Matrix): RDD[Vector] = { val columns: Iterator[Array[Double]] = m.toArray.grouped(m.numRows) // val rows: Seq[Array[Double]] = columns.toSeq // Skip this if you want a column-major RDD. val rows: Seq[Seq[Double]] = columns.toSeq.transpose // Skip this if you want a column-major RDD. val vectors: Seq[DenseVector] = rows.map(row => new DenseVector(row.toArray)) sc.parallelize(vectors)
How to convert matrix to RDD[Vector] in spark
标签:logs input this parallel nsf result park ted his
原文地址:http://www.cnblogs.com/xiaoma0529/p/7216802.html