Spark学习笔记-Spark Streaming

时间：2015-06-14 22:43:24 阅读：211 评论：0 收藏：0 [点我收藏+]

标签：

http://spark.apache.org/docs/1.2.1/streaming-programming-guide.html

在SparkStreaming中如何对数据进行分片

Level of Parallelism in Data Processing

Cluster resources can be under-utilized if the number of parallel tasks used in any stage of the computation is not high enough. For example, for distributed reduce operations like reduceByKey and reduceByKeyAndWindow, the default number of parallel tasks is controlled by thespark.default.parallelism configuration property. You can pass the level of parallelism as an argument (see PairDStreamFunctions documentation), or set the spark.default.parallelism configuration property to change the default.

并行的数据处理水平

如果在计算的任何阶段使用的并行任务的数量不够高,可能会造成集群资源可利用不足。例如，对于分布式reduce操作reduceByKey和reduceByKeyAndWindow，并行任务的默认数量是由spark.default.parallelism配置属性控制。你可以通过参数控制平行度（见PairDStreamFunctions文档，或设置spark.default.parallelism配置属性进行更改。

例如： SparkConf sparkConf = new SparkConf().setAppName("NAME").set("spark.default.parallelism", "5");

Spark学习笔记-Spark Streaming

标签：

原文地址：http://www.cnblogs.com/gnivor/p/4575743.html

踩

(0)

评论一句话评论（0）

分享档案

更多>

2021年07月29日 (22)
2021年07月28日 (40)
2021年07月27日 (32)
2021年07月26日 (79)
2021年07月23日 (29)
2021年07月22日 (30)
2021年07月21日 (42)
2021年07月20日 (16)
2021年07月19日 (90)
2021年07月16日 (35)

周排行