SPARK

时间：2017-11-09 18:43:50 阅读：168 评论：0 收藏：0 [点我收藏+]

标签：int ref func scala mem partition ram pagerank support

Note that, before Spark 2.0, the main programming interface of Spark was the Resilient Distributed Dataset (RDD). After Spark 2.0, RDDs are replaced by Dataset, which is strongly-typed like an RDD, but with richer optimizations under the hood. The RDD interface is still supported, and you can get a more complete reference at the RDD programming guide. However, we highly recommend you to switch to use Dataset, which has better performance than RDD. See the SQL programming guide to get more information about Dataset.

scala> val text=spark.read.textFile("/tmp/20171024/tian.txt")
text: org.apache.spark.sql.Dataset[String] = [value: string]

scala> text.count
res0: Long = 6

scala> val text=sc.textFile("/tmp/20171024/tian.txt")
text: org.apache.spark.rdd.RDD[String] = /tmp/20171024/tian.txt MapPartitionsRDD[7] at textFile at <console>:24

scala> text.count
res1: Long = 6

You can get values from Dataset directly, by calling some actions, or transform the Dataset to get a new one. For more details, please read theAPI doc.

Caching

Spark also supports pulling data sets into a cluster-wide in-memory cache. This is very useful when data is accessed repeatedly, such as when querying a small “hot” dataset or when running an iterative algorithm like PageRank. As a simple example, let’s mark our linesWithSpark dataset to be cached:

scala> text.cache()
res2: text.type = /tmp/20171024/tian.txt MapPartitionsRDD[7] at textFile at <console>:24

scala> text.count
res3: Long = 6

It may seem silly to use Spark to explore and cache a 100-line text file. The interesting part is that these same functions can be used on very large data sets, even when they are striped across tens or hundreds of nodes.

SPARK

标签：int ref func scala mem partition ram pagerank support

原文地址：http://www.cnblogs.com/playforever/p/7810196.html

踩

(0)

评论一句话评论（0）

分享档案

更多>

2021年07月29日 (22)
2021年07月28日 (40)
2021年07月27日 (32)
2021年07月26日 (79)
2021年07月23日 (29)
2021年07月22日 (30)
2021年07月21日 (42)
2021年07月20日 (16)
2021年07月19日 (90)
2021年07月16日 (35)

周排行