码迷,mamicode.com
首页 > 其他好文 > 详细

spark常用的算子总结(7)—— join

时间:2019-09-01 01:12:57      阅读:158      评论:0      收藏:0      [点我收藏+]

标签:val   scala   rip   print   stop   parallel   nbsp   lang   lis   

join是根据key做两张表全连接

("B", 2)连接了("B", "B1")就变成了(B,(2,B1))
val arr = List(("A", 1), ("B", 2), ("A", 2), ("B", 3))
val arr1 = List(("A", "A1"), ("B", "B1"), ("A", "A2"), ("B", "B2"))
val rdd = sc.parallelize(arr, 3)
val rdd1 = sc.parallelize(arr1, 3)
val groupByKeyRDD = rdd.join(rdd1)
groupByKeyRDD.foreach(println)

# (B,(2,B1))
# (B,(2,B2))
# (B,(3,B1))
# (B,(3,B2))
 
# (A,(1,A1))
# (A,(1,A2))
# (A,(2,A1))
# (A,(2,A2

 


LeftOutJoin
左外连接,如果右边没有与之匹配的用None表示,有就有some

 



//省略
val arr = List(("A", 1), ("B", 2), ("A", 2), ("B", 3),("C",1))
val arr1 = List(("A", "A1"), ("B", "B1"), ("A", "A2"), ("B", "B2"))
val rdd = sc.parallelize(arr, 3)
val rdd1 = sc.parallelize(arr1, 3)
val leftOutJoinRDD = rdd.leftOuterJoin(rdd1)
leftOutJoinRDD .foreach(println)
sc.stop

# (B,(2,Some(B1)))
# (B,(2,Some(B2)))
# (B,(3,Some(B1)))
# (B,(3,Some(B2)))
# (C,(1,None))
# (A,(1,Some(A1)))
# (A,(1,Some(A2)))
# (A,(2,Some(A1)))
# (A,(2,Some(A2)))

 





 

spark常用的算子总结(7)—— join

标签:val   scala   rip   print   stop   parallel   nbsp   lang   lis   

原文地址:https://www.cnblogs.com/pocahontas/p/11441030.html

(0)
(0)
   
举报
评论 一句话评论(0
登录后才能评论!
© 2014 mamicode.com 版权所有  联系我们:gaon5@hotmail.com
迷上了代码!