码迷,mamicode.com
首页 > 其他好文 > 详细

Spark学习笔记-Spark Streaming

时间:2015-06-14 22:43:24      阅读:211      评论:0      收藏:0      [点我收藏+]

标签:

http://spark.apache.org/docs/1.2.1/streaming-programming-guide.html

在SparkStreaming中如何对数据进行分片

Level of Parallelism in Data Processing

Cluster resources can be under-utilized if the number of parallel tasks used in any stage of the computation is not high enough. For example, for distributed reduce operations like reduceByKey and reduceByKeyAndWindow, the default number of parallel tasks is controlled by thespark.default.parallelism configuration property. You can pass the level of parallelism as an argument (see PairDStreamFunctions documentation), or set the spark.default.parallelism configuration property to change the default.

 

并行的数据处理水平

如果在计算的任何阶段使用的并行任务的数量不够高,可能会造成集群资源可利用不足。例如,对于分布式reduce操作reduceByKey和reduceByKeyAndWindow,并行任务的默认数量是由spark.default.parallelism配置属性控制。你可以通过参数控制平行度(PairDStreamFunctions文档,或设置spark.default.parallelism配置属性进行更改。

例如: SparkConf sparkConf = new SparkConf().setAppName("NAME").set("spark.default.parallelism", "5");

 

Spark学习笔记-Spark Streaming

标签:

原文地址:http://www.cnblogs.com/gnivor/p/4575743.html

(0)
(0)
   
举报
评论 一句话评论(0
登录后才能评论!
© 2014 mamicode.com 版权所有  联系我们:gaon5@hotmail.com
迷上了代码!