标签:class blog code http tar com
以后spark,mapreduce,mpi可能三者集于同一平台,各自的侧重点有所不用,相当于云计算与高性能计算的集合,互补,把spark的基础看了看,现在把开发环境看看,主要是看源码,最近Apache Spark源码走读系列挺好的,看了些。具体环境配置不是太复杂,具体可以看https://github.com/apache/spark
1、代码下载
git clone
https://github.com/apache/spark.git
2、直接构建spark
我是基于hadoop2.2.0的,因此执行如下:
SPARK_HADOOP_VERSION=2.2.0 SPARK_YARN=true sbt/sbt assembly
3、具体使用参考https://github.com/apache/spark
The easiest way to start using Spark is through the Scala shell:
./bin/spark-shell
Try the following command, which should return 1000:
scala> sc.parallelize(1 to 1000).count()
Alternatively, if you prefer Python, you can use the Python shell:
./bin/pyspark
And run the following command, which should also return 1000:
>>> sc.parallelize(range(1000)).count()
Spark also comes with several sample programs in the examples
directory. To run one of them, use./bin/run-example <class> [params]
. For example:
./bin/run-example SparkPi
will run the Pi example locally.
You can set the MASTER environment variable when running examples to submit examples to a cluster. This can be a mesos:// or spark:// URL, "yarn-cluster" or "yarn-client" to run on YARN, and "local" to run locally with one thread, or "local[N]" to run locally with N threads. You can also use an abbreviated class name if the class is in the examples
package. For instance:
MASTER=spark://host:7077 ./bin/run-example SparkPi
Many of the example programs print usage help if no params are given.
Testing first requires building Spark. Once Spark is built, tests can be run using:
./sbt/sbt test
去idea官网下载idea的tar.gz包,解压就行。运行idea,安装scala插件。
在源码根目录,使用如下命令
./sbt/sbt gen-idea
就生成了idea项目文件。使用 idea,点击File->Open project
,浏览到 incubator-spark
文件夹,打开项目,就可以修改Spark代码了。
具体参考:https://github.com/apache/spark
http://cn.soulmachine.me/blog/20140130/
标签:class blog code http tar com
原文地址:http://www.cnblogs.com/fengbing/p/3807131.html