spark调优篇-spark on yarn web UI

时间：2019-12-14 18:50:53 阅读：174 评论：0 收藏：0 [点我收藏+]

标签：serialize 根据 dfs apach tracking EDA url 本地 compress

spark on yarn 的执行过程在 yarn RM 上无法直接查看，即 http://192.168.10.10:8088，这对于调试程序很不方便，所以需要手动配置

配置方法

1. 配置 spark-defaults.conf

cp spark-defaults.conf.template spark-defaults.conf

添加如下配置

spark.eventLog.enabled           true
spark.eventLog.dir               hdfs://hadoop10:9000//user/root/history
spark.serializer                 org.apache.spark.serializer.KryoSerializer
spark.driver.memory              5g
spark.executor.extraJavaOptions  -XX:+PrintGCDetails -Dkey=value -Dnumbers="one two three"
spark.history.fs.logDirectory      hdfs://hadoop10:9000//user/root/history
spark.yarn.historyServer.address master:18080

spark.eventLog.enabled 设置为 true 表示开启日志记录

spark.eventLog.dir 表示存储日志的地址，application 运行过程中所有的日志均存于该目录下，一般设置为 hdfs 路径，也可以设置为本地路径

　　// HDFS：hdfs://hadoop10:9000//user/root/history　　事先创建目录

　　// 本地：file:///directory

spark.history.fs.logDirectory 这个配置和 spark.eventLog.dir 保持一致，spark histroy server 只展示该路径下的信息

spark.yarn.historyServer.address 设置 history server 的 ip port，指向 http://192.168.10.10:8088 上的 Tracking UI

spark.eventLog.compress 是否压缩记录 Spark 事件信息，前提 spark.eventLog.enabled 为 true，默认使用的是 snappy

2. 修改 spark-env.sh

在原来基础上添加

export SPARK_HISTORY_OPTS="-Dspark.history.retainedApplications=15"

spark.history.retainedApplications 设置在 History Server 显示的 Application 历史记录个数，如果超过这个值，旧的应用程序信息将被删除.

3. 启动 Spark History Server

sbin/start-history-server.sh

此时打开 http://192.168.10.10:18080 即可查看

web UI 解析

web ui 包括以下几部分

技术图片

假设执行如下命令

spark-submit --master yarn --num-executors 8 --executor-cores 5 gpsfreq.py

启动 8 个 Executor，每个 Executor 启动 5 个 core，共 40 个 core

技术图片

stage

技术图片

点开第 0个 stage

技术图片

根据我上面的分析，做了如下改动

spark-submit --master yarn --num-executors 4 --executor-cores 1 gpsfreq.py

结果效率提升了2倍

技术图片

参考资料：

https://www.jianshu.com/p/4d28edc599ea　　为Spark on Yarn配置WebUI日志记录

https://blog.csdn.net/zyj8170/article/details/58158966　　Spark on YARN配置日志Web UI

https://www.cnblogs.com/hexu105/p/8182472.html　　spark on yarn UI界面详解

spark调优篇-spark on yarn web UI

标签：serialize 根据 dfs apach tracking EDA url 本地 compress

原文地址：https://www.cnblogs.com/yanshw/p/12038633.html

踩

(0)

评论一句话评论（0）

分享档案

更多>

2021年07月29日 (22)
2021年07月28日 (40)
2021年07月27日 (32)
2021年07月26日 (79)
2021年07月23日 (29)
2021年07月22日 (30)
2021年07月21日 (42)
2021年07月20日 (16)
2021年07月19日 (90)
2021年07月16日 (35)

周排行