标签:zook 安装使用 split bin 超过 需要 count app http
地址
首先我们要确认我们的Linux主机是否安装了scala,如果没有安装则需要安装,5台机器都需要安装
学习scala时使用的是2.12版本,所以我们选择spark2.4.2及以上的版本
cd spark/conf/
mv slaves.template slaves
mv spark-env.sh.template spark-env.sh
vim slaves
hadoop101
hadoop102
hadoop103
vim spark-env.sh
export JAVA_HOME=/soft/module/jdk1.8.0_161
export SPARK_MASTER_HOST=hadoop100
export SPARK_MASTER_PORT=7077
xsync spark/
sbin/start-all.sh
xcall.sh
------------------- hadoop100 --------------
10021 Jps
9944 Master
------------------- hadoop101 --------------
9159 Jps
9096 Worker
------------------- hadoop102 --------------
8740 Worker
8804 Jps
------------------- hadoop103 --------------
8749 Worker
8813 Jps
网页查看:hadoop100:8080
注意:如果遇到 “JAVA_HOME not set” 异常,可以在sbin目录下的spark-config.sh 文件中加入如下配置:
export JAVA_HOME=XXXX
bin/spark-submit --class org.apache.spark.examples.SparkPi --executor-memory 1G --total-executor-cores 2 ./examples/jars/spark-examples_2.12-3.0.0-preview2.jar 100
/soft/module/spark/bin/spark-shell --master spark://hadoop100:7077 --executor-memory 1g --total-executor-cores 2
参数:--master spark://hadoop100:7077指定要连接的集群的master
执行WordCount程序
scala>sc.textFile("input").flatMap(_.split(" ")).map((_,1)).reduceByKey(_+_).collect
res0: Array[(String, Int)] = Array((hadoop,6), (oozie,3), (spark,3), (hive,3), (atguigu,3), (hbase,6))
scala>
mv spark-defaults.conf.template spark-defaults.conf
vim spark-defaults.conf
spark.eventLog.enabled true
spark.eventLog.dir hdfs://hadoop100:9000/directory
注意:HDFS上的目录需要提前存在。
hadoop fs -mkdir /directory
vim spark-env.sh
export SPARK_HISTORY_OPTS="
-Dspark.history.ui.port=18080
-Dspark.history.retainedApplications=30
-Dspark.history.fs.logDirectory=hdfs://hadoop100:9000/directory"
参数描述:
spark.eventLog.dir:Application在运行过程中所有的信息均记录在该属性指定的路径下;
spark.history.ui.port=18080 WEBUI访问的端口号为18080
spark.history.fs.logDirectory=hdfs://hadoop102:9000/directory 配置了该属性后,在start-history-server.sh时就无需再显式的指定路径,Spark History Server页面只展示该指定路径下的信息
spark.history.retainedApplications=30指定保存Application历史记录的个数,如果超过这个值,旧的应用程序信息将被删除,这个是内存中的应用数,而不是页面上显示的应用数。
xsync spark-defaults.conf
xsync spark-env.sh
sbin/stop-history-server.sh
bin/spark-submit --class org.apache.spark.examples.SparkPi --executor-memory 1G --total-executor-cores 2 ./examples/jars/spark-examples_2.12-3.0.0-preview2.jar 100
hadoop100:18080
vim spark-env.sh
注释掉如下内容:
#SPARK_MASTER_HOST=hadoop100
#SPARK_MASTER_PORT=7077
添加上如下内容:
export SPARK_DAEMON_JAVA_OPTS="
-Dspark.deploy.recoveryMode=ZOOKEEPER
-Dspark.deploy.zookeeper.url=hadoop101,hadoop102,hadoop103
-Dspark.deploy.zookeeper.dir=/spark"
xsync spark-env.sh
sbin/start-all.sh
sbin/start-master.sh
/soft/module/spark/bin/spark-shell --master spark://hadoop100:7077,hadoop102:7077 --executor-memory 2g --total-executor-cores 2
bin/spark-submit --class org.apache.spark.examples.SparkPi --master spark://hadoop100:7077,hadoop101:7077 --executor-memory 1G --total-executor-cores 2 ./examples/jars/spark-examples_2.12-3.0.0-preview2.jar 100
./spark-shell --master spark://hadoop100:7077,hadoop101:7077
hadoop fs -mkdir -p /spark/input
hadoop fs -put RELEASE /spark/input
sc.textFile("/spark/input").flatMap(_.split(" ")).map(word=>(word,1)).reduceByKey(_+_).map(entry=>(entry._2,entry._1)).sortByKey(false,1).map(entry=>(entry._2,entry._1)).saveAsTextFile("/spark/output/")
好像只是启动命令上有区别
SparkUI界面默认端口号为8080(可能会被占用,被占用后默认+1HTTP ERROR 404 Not Found),两种方法修改
export SPARK_MASTER_WEBUI_PORT=8082
if [ "$SPARK_MASTER_WEBUI_PORT" = "" ]; then
SPARK_MASTER_WEBUI_PORT=8082
fi
源网页:https://www.cnblogs.com/eric666666/p/11228825.html
标签:zook 安装使用 split bin 超过 需要 count app http
原文地址:https://www.cnblogs.com/Hephaestus/p/12362267.html