Spark Programming--Actions

时间：2016-01-02 14:24:21 阅读：335 评论：0 收藏：0 [点我收藏+]

标签：

first

def first(): T

first返回RDD中的第一个元素，不排序。

例子：

技术分享

count

def count(): Long

count返回RDD中的元素数量

例子：

技术分享

reduce

def reduce(f: (T, T) ⇒ T): T

根据映射函数f，对RDD中的元素进行二元计算，返回计算结果（可用于求和，字符串叠加等等）

例子：

技术分享

take

def take(num: Int): Array[T]

take用于获取RDD中从0到num-1下标的元素，不排序

例子：

技术分享

top

def top(num: Int)(implicit ord: Ordering[T]): Array[T]

top函数用于从RDD中，按照默认（降序）或者指定的排序规则，返回前num个元素

例子：（注意与take区别）

技术分享

takeOrdered

def takeOrdered(num: Int)(implicit ord: Ordering[T]): Array[T]

takeOrdered和top类似，只不过以和top相反的顺序返回元素

例子：（注意与take、top比较）

技术分享

aggregate

fold

fold(zeroValue, op)

Aggregate the elements of each partition, and then the results for all the partitions, using a given associative and commutative function and a neutral “zero value.”

类似于给一个初值和一个函数，将rdd中每一个值累加到zeroValue中

例子：

技术分享

lookup

lookup(key)

Return the list of values in the RDD for key key. This operation is done efficiently if the RDD has a known partitioner by only searching the partition that the key maps to.

lookup用于(K,V)类型的RDD,指定K值，返回RDD中该K对应的所有V值

例子：（查询）

技术分享