标签:val scala rip print stop parallel nbsp lang lis
join是根据key做两张表全连接
("B", 2)连接了("B", "B1")就变成了(B,(2,B1))
val arr = List(("A", 1), ("B", 2), ("A", 2), ("B", 3)) val arr1 = List(("A", "A1"), ("B", "B1"), ("A", "A2"), ("B", "B2")) val rdd = sc.parallelize(arr, 3) val rdd1 = sc.parallelize(arr1, 3) val groupByKeyRDD = rdd.join(rdd1) groupByKeyRDD.foreach(println) # (B,(2,B1)) # (B,(2,B2)) # (B,(3,B1)) # (B,(3,B2)) # (A,(1,A1)) # (A,(1,A2)) # (A,(2,A1)) # (A,(2,A2
LeftOutJoin
左外连接,如果右边没有与之匹配的用None表示,有就有some
//省略 val arr = List(("A", 1), ("B", 2), ("A", 2), ("B", 3),("C",1)) val arr1 = List(("A", "A1"), ("B", "B1"), ("A", "A2"), ("B", "B2")) val rdd = sc.parallelize(arr, 3) val rdd1 = sc.parallelize(arr1, 3) val leftOutJoinRDD = rdd.leftOuterJoin(rdd1) leftOutJoinRDD .foreach(println) sc.stop # (B,(2,Some(B1))) # (B,(2,Some(B2))) # (B,(3,Some(B1))) # (B,(3,Some(B2))) # (C,(1,None)) # (A,(1,Some(A1))) # (A,(1,Some(A2))) # (A,(2,Some(A1))) # (A,(2,Some(A2)))
标签:val scala rip print stop parallel nbsp lang lis
原文地址:https://www.cnblogs.com/pocahontas/p/11441030.html