标签:完整 url ref ati 分布式文件系 kpi spark集群 bin 测试
1、官网下载
2、spar01和02都建立/opt/scala目录,解压tar -zxvf scala-2.12.8.tgz
3、配置环境变量
vi /etc/profile 增加一行
export SCALA_HOME=/opt/scala/scala-2.12.8
同时把hadoop的环境变量增加进去,完整版是:
export JAVA_HOME=/opt/java/jdk1.8.0_191
export HADOOP_HOME=/opt/hadoop/hadoop-2.8.0
export HADOOP_CONF_DIR=${HADOOP_HOME}/etc/hadoop
export HADOOP_COMMON_LIB_NATIVE_DIR=${HADOOP_HOME}/lib/native
export HADOOP_OPTS="-Djava.library.path=${HADOOP_HOME}/lib"
export SCALA_HOME=/opt/scala/scala-2.12.8
export CLASSPATH=$:CLASSPATH:${JAVA_HOME}/lib/
export PATH=.:${JAVA_HOME}/bin:${HADOOP_HOME}/bin:${HADOOP_HOME}/sbin:${SPARK_HOME}/bin:${SCALA_HOME}/bin:$PATH
然后source /etc/profile
4、验证
scala -version
5、同步spark02配置文件
scp /etc/profile spark02:/etc
1、下载,解压,同scala,建立/opt/spark目录
2、配置环境变量
export SPARK_HOME=/opt/spark/spark-2.4.0-bin-hadoop2.7
完整版更新:
export JAVA_HOME=/opt/java/jdk1.8.0_191
export HADOOP_HOME=/opt/hadoop/hadoop-2.8.0
export HADOOP_CONF_DIR=${HADOOP_HOME}/etc/hadoop
export HADOOP_COMMON_LIB_NATIVE_DIR=${HADOOP_HOME}/lib/native
export HADOOP_OPTS="-Djava.library.path=${HADOOP_HOME}/lib"
export SCALA_HOME=/opt/scala/scala-2.12.8
export SPARK_HOME=/opt/spark/spark-2.4.0-bin-hadoop2.7
export CLASSPATH=$:CLASSPATH:${JAVA_HOME}/lib/
export PATH=.:${JAVA_HOME}/bin:${HADOOP_HOME}/bin:${HADOOP_HOME}/sbin:${SPARK_HOME}/bin:${SCALA_HOME}/bin:$PATH
source /etc/profile
scp /etc/profile spark02:/etc
3、配置conf下文件
cp spark-env.sh.template spark-env.sh
cp slaves.template slaves
vi spark-env.sh
export SCALA_HOME=/opt/scala/scala-2.12.8
export JAVA_HOME=/opt/java/jdk1.8.0_191
export HADOOP_HOME=/opt/hadoop/hadoop-2.8.0
export HADOOP_CONF_DIR=$HADOOP_HOME/etc/hadoop
export SPARK_HOME=/opt/spark/spark-2.4.0-bin-hadoop2.7
export SPARK_MASTER_IP=spark01
export SPARK_EXECUTOR_MEMORY=2G
vi slaves
spark02
同步到spark02
scp /opt/spark/spark-2.4.0-bin-hadoop2.7/conf/spark-env.sh spark02:/opt/spark/spark-2.4.0-bin-hadoop2.7/conf/
scp /opt/spark/spark-2.4.0-bin-hadoop2.7/conf/slaves spark02:/opt/spark/spark-2.4.0-bin-hadoop2.7/conf/
因为spark是依赖于hadoop提供的分布式文件系统的,所以在启动spark之前,先确保hadoop在正常运行。
在hadoop正常运行的情况下,在spark01(也就是hadoop的namenode,spark的marster节点)上执行命令:
cd /opt/spark/spark-2.4.0-bin-hadoop2.7/sbin
执行启动脚本:./start-all.sh
在浏览器里访问Mster机器,我的Spark集群里Master机器是spark01,IP地址是192.168.2.245,访问8080端口,URL是:http://192.168.2.245:8080/
用local模式运行一个计算圆周率的Demo。按照下面的步骤来操作。
第一步,进入到Spark的根目录,也就是执行下面的脚本:
./bin/spark-submit --class org.apache.spark.examples.SparkPi --master local examples/jars/spark-examples_2.11-2.4.0.jar
yarn-client模式:
./bin/spark-submit --class org.apache.spark.examples.SparkPi --master yarn-client examples/jars/spark-examples_2.11-2.4.0.jar
03. 搭建Spark集群(CentOS7+Spark2.1.1+Hadoop2.8.0)
标签:完整 url ref ati 分布式文件系 kpi spark集群 bin 测试
原文地址:https://www.cnblogs.com/yjm0330/p/10080901.html