标签:map parallel 一个 comm collect pair 键值 key partition
1.keys
功能:
返回所有键值对的key
示例
val list = List("hadoop","spark","hive","spark") val rdd = sc.parallelize(list) val pairRdd = rdd.map(x => (x,1)) pairRdd.keys.collect.foreach(println)
结果
hadoop spark hive spark list: List[String] = List(hadoop, spark, hive, spark) rdd: org.apache.spark.rdd.RDD[String] = ParallelCollectionRDD[142] at parallelize at command-3434610298353610:2 pairRdd: org.apache.spark.rdd.RDD[(String, Int)] = MapPartitionsRDD[143] at map at command-3434610298353610:3
2.values
功能:
返回所有键值对的value
示例
val list = List("hadoop","spark","hive","spark") val rdd = sc.parallelize(list) val pairRdd = rdd.map(x => (x,1)) pairRdd.values.collect.foreach(println)
结果
1 1 1 1 list: List[String] = List(hadoop, spark, hive, spark) rdd: org.apache.spark.rdd.RDD[String] = ParallelCollectionRDD[145] at parallelize at command-3434610298353610:2 pairRdd: org.apache.spark.rdd.RDD[(String, Int)] = MapPartitionsRDD[146] at map at command-3434610298353610:3
3.mapValues(func)
功能:
对键值对每个value都应用一个函数,但是,key不会发生变化。
示例
val list = List("hadoop","spark","hive","spark") val rdd = sc.parallelize(list) val pairRdd = rdd.map(x => (x,1)) pairRdd.mapValues(_+1).collect.foreach(println)//对每个value进行+1
结果
(hadoop,2) (spark,2) (hive,2) (spark,2)
原文链接:http://www.mamicode.com/info-detail-2285651.html
spark中常用转换操作keys 、values和mapValues
标签:map parallel 一个 comm collect pair 键值 key partition
原文地址:https://www.cnblogs.com/123456www/p/12308247.html