码迷,mamicode.com
首页 > 其他好文 > 详细

大数据基础之Oozie(2)常见问题

时间:2018-11-02 16:00:33      阅读:328      评论:0      收藏:0      [点我收藏+]

标签:两种   strong   creat   config   ons   ase   api   sed   run   

1 oozie如何查看任务日志?

通过oozie job id可以查看流程详细信息,命令如下:

oozie job -info 0012077-180830142722522-oozie-hado-W

 

流程详细信息如下:

Job ID : 0012077-180830142722522-oozie-hado-W

------------------------------------------------------------------------------------------------------------------------------------

Workflow Name :  $workflow_name

App Path      : hdfs://$hdfs_name/oozie/wf/$workflow_name.xml

Status        : KILLED

Run           : 0

User          : hadoop

Group         : -

Created       : 2018-09-25 02:51 GMT

Started       : 2018-09-25 02:51 GMT

Last Modified : 2018-09-25 02:53 GMT

Ended         : 2018-09-25 02:53 GMT

CoordAction ID: -

 

Actions

------------------------------------------------------------------------------------------------------------------------------------

ID                                                                            Status    Ext ID                 Ext Status Err Code 

------------------------------------------------------------------------------------------------------------------------------------

0012077-180830142722522-oozie-hado-W@:start:                                  OK        -                      OK         -        

------------------------------------------------------------------------------------------------------------------------------------

0012077-180830142722522-oozie-hado-W@$action_name                    ERROR     application_1537326594090_5663FAILED/KILLEDJA018    

------------------------------------------------------------------------------------------------------------------------------------

0012077-180830142722522-oozie-hado-W@Kill                                     OK        -                      OK         E0729    

------------------------------------------------------------------------------------------------------------------------------------

 

失败的任务定义如下

<action name="$action_name"> 

        <spark xmlns="uri:oozie:spark-action:0.1"> 

            <job-tracker>${job_tracker}</job-tracker> 

            <name-node>${name_node}</name-node> 

            <master>${jobmaster}</master> 

            <mode>${jobmode}</mode> 

            <name>${jobname}</name> 

            <class>${jarclass}</class> 

            <jar>${jarpath}</jar> 

            <spark-opts>${sparkopts}</spark-opts> 

        </spark>

 

在yarn上可以看到application_1537326594090_5663对应的application如下

application_1537326594090_5663       hadoop oozie:launcher:T=spark:W=$workflow_name:A=$action_name:ID=0012077-180830142722522-oozie-hado-W         Oozie Launcher

 

查看application_1537326594090_5663日志发现

2018-09-25 10:52:05,237 [main] INFO  org.apache.hadoop.yarn.client.api.impl.YarnClientImpl  - Submitted application application_1537326594090_5664

 

yarn上application_1537326594090_5664对应的application如下

application_1537326594090_5664       hadoop    $app_name SPARK

 

即application_1537326594090_5664才是Action对应的spark任务,为什么中间会多一步,

简要来说,Oozie执行Action时,即ActionExecutor(最主要的子类是JavaActionExecutor,hive、spark等action都是这个类的子类),JavaActionExecutor首先会提交一个LauncherMapper(map任务)到yarn,其中会执行LauncherMain(具体的action是其子类,比如JavaMain、SparkMain等),spark任务会执行SparkMain,在SparkMain中会调用org.apache.spark.deploy.SparkSubmit来提交任务

 

2 oozie提交spark任务如何添加依赖?

spark任务添加依赖的方式:

如果是local方式运行,可以通过--jars来添加依赖;

如果是yarn方式运行,可以通过spark.yarn.jars来添加依赖;

这两种方式在oozie上都行不通,首先oozie上没办法也不应该通过local运行,其次通过spark.yarn.jars方式配置你会发现根本不会生效,来看为什么

查看LauncherMapper的日志(可见上述问题1)

 

Spark Version 2.1.1

Spark Action Main class        : org.apache.spark.deploy.SparkSubmit

 

Oozie Spark action configuration

=================================================================

...

                    --conf

                    spark.yarn.jars=hdfs://$hdfs_name/spark/sparkjars/*.jar

                    --conf

                    spark.yarn.jars=hdfs://$hdfs_name/oozie/share/lib/lib_20180801121138/spark/spark-yarn_2.11-2.1.1.jar

 

可见oozie会自己添加一个新的spark.yarn.jars配置,如果提供两个相同的key,spark会如何处理

 

org.apache.spark.deploy.SparkSubmit

    val appArgs = new SparkSubmitArguments(args)

 

org.apache.spark.launcher.SparkSubmitOptionParser

        if (!handle(name, value)) {

 

org.apache.spark.deploy.SparkSubmitArguments

  override protected def handle(opt: String, value: String): Boolean = {

  ...

      case CONF =>

        value.split("=", 2).toSeq match {

          case Seq(k, v) => sparkProperties(k) = v

          case _ => SparkSubmit.printErrorAndExit(s"Spark config without ‘=‘: $value")

        }

 

可见会直接覆盖,使用最后一个配置,即oozie的配置,而不是应用自己提供的配置,这样就需要应用自己将特殊依赖打包到应用jar中,具体使用maven的maven-assembly-plugin,配置其中的<dependencySets><dependencySet><includes><include>,详细配置如下:

 

<assembly xmlns="http://maven.apache.org/plugins/maven-assembly-plugin/assembly/1.1.0"

          xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"

          xsi:schemaLocation="http://maven.apache.org/plugins/maven-assembly-plugin/assembly/1.1.0 http://maven.apache.org/xsd/assembly-1.1.0.xsd">

    <!-- TODO: a jarjar format would be better -->

    <id>jar-with-dependencies</id>

    <formats>

        <format>jar</format>

    </formats>

    <includeBaseDirectory>false</includeBaseDirectory>

    <dependencySets>

        <dependencySet>

            <outputDirectory>/</outputDirectory>

            <useProjectArtifact>true</useProjectArtifact>

            <unpack>true</unpack>

            <scope>runtime</scope>

            <includes>

                <include>redis.clients:jedis</include>

                <include>org.apache.commons:commons-pool2</include>

            </includes>

        </dependencySet>

    </dependencySets>

</assembly>

 

这里只是将默认提供的jar-with-dependencies.xml内容拷贝出来添加includes配置;

 

大数据基础之Oozie(2)常见问题

标签:两种   strong   creat   config   ons   ase   api   sed   run   

原文地址:https://www.cnblogs.com/barneywill/p/9895317.html

(0)
(0)
   
举报
评论 一句话评论(0
登录后才能评论!
© 2014 mamicode.com 版权所有  联系我们:gaon5@hotmail.com
迷上了代码!