spark submit

时间：2018-08-13 16:59:52 阅读：141 评论：0 收藏：0 [点我收藏+]

使用spark-submit启动应用程序

捆绑用户应用程序后，可以使用该bin/spark-submit脚本启动它。此脚本负责使用Spark及其依赖项设置类路径，并且可以支持Spark支持的不同集群管理器和部署模式：

./bin/spark-submit   --class <main-class>   --master <master-url>   --deploy-mode <deploy-mode>   --conf <key>=<value>   ... # other options
  <application-jar>   [application-arguments]

一些常用的选项是：

--class：您的申请的入口点（例如org.apache.spark.examples.SparkPi）
--master：群集的主URL（例如spark://23.195.26.187:7077）
--deploy-mode：是在工作节点（cluster）上部署驱动程序还是在本地部署外部客户端（client）（默认值: client) ?
--conf：key = value格式的任意Spark配置属性。对于包含空格的值，在引号中包含“key = value”（如图所示）。
application-jar：捆绑jar的路径，包括您的应用程序和所有依赖项。URL必须在群集内部全局可见，例如，所有节点上都存在的hdfs://路径或file://路径。
application-arguments：参数传递给主类的main方法，如果有的话
?常见的部署策略是从与您的工作机器物理位于同一位置的网关机器（例如，独立EC2集群中的主节点）提交您的应用程序。在此设置中，client模式是合适的。在client模式下，驱动程序直接在spark-submit进程内启动，该进程充当群集的客户端。应用程序的输入和输出附加到控制台。因此，此模式特别适用于涉及REPL的应用程序（例如Spark shell）。

或者，如果您的应用程序是从远离工作机器的计算机提交的（例如，在您的笔记本电脑上本地），则通常使用cluster模式来最小化驱动程序和执行程序之间的网络延迟。目前，独立模式不支持Python应用程序的集群模式。

对于Python应用程序，只需通过一个.py在的地方文件

有一些特定于正在使用的集群管理器的选项。例如，对于具有部署模式的Spark独立群集cluster，您还可以指定--supervise在失败时使用非零退出代码确保驱动程序自动重新启动。要枚举所有可用的选项spark-submit，请运行它--help。以下是常见选项的几个示例：

# Run application locally on 8 cores
./bin/spark-submit   --class org.apache.spark.examples.SparkPi   --master local[8]   /path/to/examples.jar   100

# Run on a Spark standalone cluster in client deploy mode
./bin/spark-submit   --class org.apache.spark.examples.SparkPi   --master spark://207.184.161.138:7077   --executor-memory 20G   --total-executor-cores 100   /path/to/examples.jar   1000

# Run on a Spark standalone cluster in cluster deploy mode with supervise
./bin/spark-submit   --class org.apache.spark.examples.SparkPi   --master spark://207.184.161.138:7077   --deploy-mode cluster   --supervise   --executor-memory 20G   --total-executor-cores 100   /path/to/examples.jar   1000

# Run on a YARN cluster
export HADOOP_CONF_DIR=XXX
./bin/spark-submit   --class org.apache.spark.examples.SparkPi   --master yarn   --deploy-mode cluster \  # can be client for client mode
  --executor-memory 20G   --num-executors 50   /path/to/examples.jar   1000

# Run a Python application on a Spark standalone cluster
./bin/spark-submit   --master spark://207.184.161.138:7077   examples/src/main/python/pi.py   1000

# Run on a Mesos cluster in cluster deploy mode with supervise
./bin/spark-submit   --class org.apache.spark.examples.SparkPi   --master mesos://207.184.161.138:7077   --deploy-mode cluster   --supervise   --executor-memory 20G   --total-executor-cores 100   http://path/to/examples.jar   1000

# Run on a Kubernetes cluster in cluster deploy mode
./bin/spark-submit   --class org.apache.spark.examples.SparkPi   --master k8s://xx.yy.zz.ww:443   --deploy-mode cluster   --executor-memory 20G   --num-executors 50   http://path/to/examples.jar   1000

主URL

传递给Spark的主URL可以采用以下格式之一：

主URL	含义
local	使用一个工作线程在本地运行Spark（即根本没有并行性）。
local[K]	使用K个工作线程在本地运行Spark（理想情况下，将其设置为计算机上的核心数）。
local[K,F]	使用K个工作线程和F maxFailures在本地运行Spark（有关此变量的说明，请参阅spark.task.maxFailures）
local[*]	使用与计算机上的逻辑核心一样多的工作线程在本地运行Spark。
local[*,F]	使用与计算机和F maxFailures上的逻辑核心一样多的工作线程在本地运行Spark。
spark://HOST:PORT	连接到给定的Spark独立集群主服务器。端口必须是主服务器配置使用的端口，默认为7077。
spark://HOST1:PORT1,HOST2:PORT2	使用Zookeeper与备用主服务器连接到给定的Spark独立群集。该列表必须具有使用Zookeeper设置的高可用性群集中的所有主主机。端口必须是每个主服务器配置使用的默认端口，默认为7077。
mesos://HOST:PORT	连接到给定的Mesos群集。端口必须是您配置使用的端口，默认为5050。或者，对于使用ZooKeeper的Mesos群集，请使用mesos://zk://...。要提交--deploy-mode cluster，应将HOST：PORT配置为连接到MesosClusterDispatcher。
yarn	根据值，以或模式连接到YARN群集。将根据或变量找到群集位置。 clientcluster--deploy-modeHADOOP_CONF_DIRYARN_CONF_DIR
k8s://HOST:PORT	以模式连接到Kubernetes集群 cluster。客户端模式目前不受支持，将来的版本将支持。的HOST和PORT请参见[Kubernetes API服务器]（https://kubernetes.io/docs/reference/generated/kube-apiserver/）。它默认使用TLS连接。为了强制它使用不安全的连接，您可以使用 k8s://http://HOST:PORT。

spark submit

标签：捆绑 other 选项适用于因此 server val 没有 dep

原文地址：https://www.cnblogs.com/lionjulyy/p/9468900.html

踩

(0)

评论一句话评论（0）

分享档案

更多>

2021年07月29日 (22)
2021年07月28日 (40)
2021年07月27日 (32)
2021年07月26日 (79)
2021年07月23日 (29)
2021年07月22日 (30)
2021年07月21日 (42)
2021年07月20日 (16)
2021年07月19日 (90)
2021年07月16日 (35)

周排行