标签:
1.安装JDK
2.安装scala 2.10
spark-1.0.2 依赖 scala 2.10, 我们必须要安装scala 2.10.
下载 scala-2.10.*.tgz 并 保存到home目录(已经在sg206上).
$ tar -zxvf scala-2.10.*.tgz
$ sudo mv scala-2.10.*.tgz /usr/lib
$ sudo vim ~/.bash_profile
# add the following lines at the end
export SCALA_HOME=/usr/lib/scala-2.10.*.tgz
export PATH=$PATH:$SCALA_HOME/bin
# save and exit vim
#make the bash profile take effect immediately
source ~/.bash_profile
# test
$ scala -version
3.building spark
cd /home
tar -zxf spark-0.7.3-sources.gz
cd spark-0.7.3
sbt/sbt package (需要git环境 yum install git)
或者下载spark-1.0.2-bin-hadoop2.tgz
4.配置文件
spark-env.sh
############
export SCALA_HOME=/usr/lib/scala-2.9.3
export SPARK_MASTER_IP=172.16.48.202
export SPARK_WORKER_MEMORY=10G
export JAVA_HOME=***
#############
slaves
将从节点IP添加至slaves配置文件
5.启动和停止
bin/start-master.sh - Starts a master instance on the machine the script is executed on.
bin/start-slaves.sh - Starts a slave instance on each machine specified in the conf/slaves file.
bin/start-all.sh - Starts both a master and a number of slaves as described above.
bin/stop-master.sh - Stops the master that was started via the bin/start-master.sh script.
bin/stop-slaves.sh - Stops the slave instances that were started via bin/start-slaves.sh.
bin/stop-all.sh - Stops both the master and the slaves as described above.
6. 浏览master的web UI(默认http://localhost:8080). 这是你应该可以看到所有的word节点,以及他们的CPU个数和内存等信息。
7. 测试:
连接spark:spark-shell --master spark://192.168.148.42:7077
输入命令:
var file = sc.textFile(" ")
val info = file.filter(line => line.contains("INFO"))
info.count()
命令测试
spark-submit --master spark://192.168.148.42:7077 examples/src/main/python/pi.py 10
写程序测试:
import org.apache.spark.SparkConf; import org.apache.spark.api.java.JavaRDD; import org.apache.spark.api.java.JavaSparkContext; import org.apache.spark.api.java.function.Function; public class AdminOperation { public static void main(String []args) { SparkConf conf = new SparkConf().setAppName("atest").setMaster("spark://192.168.148.42:7077"); JavaSparkContext sc = new JavaSparkContext(conf); JavaRDD<String> file = sc.textFile("hdfs://huangcun-hbase1:8020/test/test.txt"); JavaRDD<String> errors = file.filter(new Function<String, Boolean>() { public Boolean call(String s) { return s.contains("ERROR"); } }); // Count all the errors errors.count(); } }
spark-submit --master spark://192.168.148.42:7077 spark-test.jar --class AdminOperation
另附提交参数:
spark-submit --master spark://BJJR-FANWEIWE1.360buyAD.local:7077 --class AdminOperation --executor-memory 20G --total-executor-cores 100 C:\Users\Administrator.BJXX-20140806JH\spark-test.jar --jars 依赖的库文件
标签:
原文地址:http://www.cnblogs.com/fanweiwei/p/4172136.html