Pair RDDs are a useful building block in many programs, as they expose operations that allow u to act on each key in parallel or regroup data across network.
Eg: pair RDDs have a reduceByKey() method that can aggeragate data separately for each key; join() method that can merge two RDDs together by grouping elements with the same key.
Creating Pair RDDs
Many formats we loading from will directly return pair RDDs for their k/v values.
By turning a regular RDD into a pair RDD --> Using map() function
val pairs = lines.map(x => (x.split("")(0), x))
Transformation on Pair RDDs
我们同样可以给Spark传送函数,不过由于pair RDDs包含的是元组tuple,所以我们要传送的函数式操作在tuples之上的。实际上Pair RDDs就是RDDs of Tuple2 object。