本文基于 Centos6.x + CDH5.x
yum install oozie
yum install oozie-client
alternatives --set oozie-tomcat-conf /etc/oozie/tomcat-conf.http
$ mysql -u root -p Enter password: ****** mysql> create database oozie; Query OK, 1 row affected (0.03 sec) mysql> grant all privileges on oozie.* to ‘oozie‘@‘localhost‘ identified by ‘oozie‘; Query OK, 0 rows affected (0.03 sec) mysql> grant all privileges on oozie.* to ‘oozie‘@‘%‘ identified by ‘oozie‘; Query OK, 0 rows affected (0.03 sec)
<property> <name>oozie.service.JPAService.jdbc.driver</name> <value>com.mysql.jdbc.Driver</value> </property> <property> <name>oozie.service.JPAService.jdbc.url</name> <value>jdbc:mysql://localhost:3306/oozie</value> </property> <property> <name>oozie.service.JPAService.jdbc.username</name> <value>oozie</value> </property> <property> <name>oozie.service.JPAService.jdbc.password</name> <value>oozie</value> </property>
$ sudo yum install mysql-connector-java $ ln -s /usr/share/java/mysql-connector-java.jar /var/lib/oozie/mysql-connector-java.jar第一行,如果你已经装过 mysql-connector-java 可以跳过这步
$ sudo -u oozie /usr/lib/oozie/bin/ooziedb.sh create -run
# unzip ext-2.2.zip # mv ext-2.2 /var/lib/oozie/
<property> <name>hadoop.proxyuser.oozie.hosts</name> <value>*</value> </property> <property> <name>hadoop.proxyuser.oozie.groups</name> <value>*</value> </property>并重启hadoop的service(namenode 和 datanode 就行了)
$ sudo -u hdfs hadoop fs -mkdir /user/oozie $ sudo -u hdfs hadoop fs -chown oozie:oozie /user/oozie $ sudo oozie-setup sharelib create -fs hdfs://mycluster/user/oozie -locallib /usr/lib/oozie/oozie-sharelib-yarn.tar.gz
这里的mycluster请自行替换成你的clusterId
$ sudo service oozie start
$ oozie admin -oozie http://host1:11000/oozie -status System mode: NORMAL为了方便,不用每次都输入oozie-server所在服务器,我们可以设置环境变量
$ export OOZIE_URL=http://host1:11000/oozie $ oozie admin -version Oozie server build version: 4.0.0-cdh5.0.0
[liboozie] # The URL where the Oozie service runs on. This is required in order for # users to submit jobs. Empty value disables the config check. oozie_url=http://host1:11000/oozie
<workflow-app name=‘wordcount-wf‘ xmlns="uri:oozie:workflow:0.1"> <start to=‘wordcount‘/> <action name=‘wordcount‘> <map-reduce> <job-tracker>${jobTracker}</job-tracker> <name-node>${nameNode}</name-node> <configuration> <property> <name>mapred.mapper.class</name> <value>org.myorg.WordCount.Map</value> </property> <property> <name>mapred.reducer.class</name> <value>org.myorg.WordCount.Reduce</value> </property> <property> <name>mapred.input.dir</name> <value>${inputDir}</value> </property> <property> <name>mapred.output.dir</name> <value>${outputDir}</value> </property> </configuration> </map-reduce> <ok to=‘end‘/> <error to=‘end‘/> </action> <kill name=‘kill‘> <message>Something went wrong: ${wf:errorCode(‘wordcount‘)}</message> </kill/> <end name=‘end‘/> </workflow-app>
$ oozie job -oozie http://localhost:11000/oozie -config examples/apps/map-reduce/job.properties -run
wget http://apache.fayea.com/oozie/4.1.0/oozie-4.1.0.tar.gz
<plugin> <groupId>org.apache.maven.plugins</groupId> <artifactId>maven-surefire-plugin</artifactId> <version>2.5</version> <configuration> <skipTests>false</skipTests> <testFailureIgnore>true</testFailureIgnore> <forkMode>once</forkMode> </configuration> </plugin>然后运行 mvn package 可以看到 target 文件夹下有 oozie-examples-4.1.0.jar
nameNode=hdfs://mycluster jobTracker=host1:8032 queueName=default examplesRoot=examples oozie.wf.application.path=${nameNode}/user/${user.name}/${examplesRoot}/apps/map-reduce outputDir=map-reduce
hdfs dfs -mkdir -p /user/root/examples/apps/map-reduce
hdfs dfs -put oozie-examples/src/main/apps/map-reduce/workflow.xml /user/root/examples/apps/map-reduce/在 /user/root/examples/apps/map-reduce/ 里面建立 lib 文件夹,并把 打包好的 oozie-examples-4.1.0.jar 上传到这个目录下
hdfs dfs -mkdir /user/root/examples/apps/map-reduce/lib hdfs dfs -put oozie-examples/target/oozie-examples-4.1.0.jar /user/root/examples/apps/map-reduce/lib
sudo -u hdfs hdfs dfs -mkdir /examples
hdfs dfs -put examples/src/main/apps /examples
hdfs dfs -mkdir -p /user/root/examples/input-data/text hdfs dfs -mkdir -p /user/root/examples/output-data hdfs dfs -put oozie-examples/src/main/data/data.txt /user/root/examples/input-data/text
oozie job -oozie http://host1:11000/oozie -config oozie-examples/src/main/apps/map-reduce/job.properties -run
0 To be or not to be, that is the question; 42 Whether ‘tis nobler in the mind to suffer 84 The slings and arrows of outrageous fortune, 129 Or to take arms against a sea of troubles, 172 And by opposing, end them. To die, to sleep; 217 No more; and by a sleep to say we end 255 The heart-ache and the thousand natural shocks 302 That flesh is heir to ? ‘tis a consummation 346 Devoutly to be wish‘d. To die, to sleep; 387 To sleep, perchance to dream. Ay, there‘s the rub, 438 For in that sleep of death what dreams may come, 487 When we have shuffled off this mortal coil, 531 Must give us pause. There‘s the respect 571 That makes calamity of so long life, 608 For who would bear the whips and scorns of time, 657 Th‘oppressor‘s wrong, the proud man‘s contumely, 706 The pangs of despised love, the law‘s delay, 751 The insolence of office, and the spurns 791 That patient merit of th‘unworthy takes, 832 When he himself might his quietus make 871 With a bare bodkin? who would fardels bear, 915 To grunt and sweat under a weary life, 954 But that the dread of something after death, 999 The undiscovered country from whose bourn 1041 No traveller returns, puzzles the will, 1081 And makes us rather bear those ills we have 1125 Than fly to others that we know not of? 1165 Thus conscience does make cowards of us all, 1210 And thus the native hue of resolution 1248 Is sicklied o‘er with the pale cast of thought, 1296 And enterprises of great pitch and moment 1338 With this regard their currents turn awry, 1381 And lose the name of action.
<workflow-app xmlns="uri:oozie:workflow:0.2" name="map-reduce-wf"> <start to="mr-node"/> <action name="mr-node"> <map-reduce> <job-tracker>${jobTracker}</job-tracker> <name-node>${nameNode}</name-node> <prepare> <delete path="${nameNode}/user/${wf:user()}/${examplesRoot}/output-data/${outputDir}"/> </prepare> <configuration> <property> <name>mapred.job.queue.name</name> <value>${queueName}</value> </property> <property> <name>mapred.mapper.class</name> <value>org.apache.oozie.example.SampleMapper</value> </property> <property> <name>mapred.reducer.class</name> <value>org.apache.oozie.example.SampleReducer</value> </property> <property> <name>mapred.map.tasks</name> <value>1</value> </property> <property> <name>mapred.input.dir</name> <value>/user/${wf:user()}/${examplesRoot}/input-data/text</value> </property> <property> <name>mapred.output.dir</name> <value>/user/${wf:user()}/${examplesRoot}/output-data/${outputDir}</value> </property> </configuration> </map-reduce> <ok to="end"/> <error to="fail"/> </action> <kill name="fail"> <message>Map/Reduce failed, error message[${wf:errorMessage(wf:lastErrorNode())}]</message> </kill> <end name="end"/> </workflow-app>
package org.apache.oozie.example; import java.io.IOException; import java.util.StringTokenizer; import org.apache.hadoop.io.IntWritable; import org.apache.hadoop.io.LongWritable; import org.apache.hadoop.io.Text; import org.apache.hadoop.mapreduce.Mapper; public class SampleMapper extends Mapper<LongWritable, Text, Text, IntWritable> { private final static IntWritable one = new IntWritable(1); private Text word = new Text(); public void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException { String line = value.toString(); StringTokenizer tokenizer = new StringTokenizer(line); while (tokenizer.hasMoreTokens()) { word.set(tokenizer.nextToken()); context.write(word, one); } } }
package org.apache.oozie.example; import java.io.IOException; import org.apache.hadoop.io.IntWritable; import org.apache.hadoop.io.Text; import org.apache.hadoop.mapreduce.Reducer; public class SampleReducer extends Reducer<Text, IntWritable, Text, IntWritable> { public void reduce(Text key, Iterable<IntWritable> values, Context context) throws IOException, InterruptedException { int sum = 0; for (IntWritable val : values) { sum += val.get(); } context.write(key, new IntWritable(sum)); } }
<workflow-app xmlns="uri:oozie:workflow:0.2" name="map-reduce-wf"> <start to="mr-node"/> <action name="mr-node"> <map-reduce> <job-tracker>${jobTracker}</job-tracker> <name-node>${nameNode}</name-node> <prepare> <delete path="${nameNode}/user/${wf:user()}/${examplesRoot}/output-data/${outputDir}"/> </prepare> <configuration> <property> <name>mapred.mapper.new-api</name> <value>true</value> </property> <property> <name>mapred.reducer.new-api</name> <value>true</value> </property> <property> <name>mapred.output.key.class</name> <value>org.apache.hadoop.io.Text</value> </property> <property> <name>mapred.output.value.class</name> <value>org.apache.hadoop.io.IntWritable</value> </property> <property> <name>mapreduce.inputformat.class</name> <value>org.apache.hadoop.mapreduce.lib.input.TextInputFormat</value> </property> <property> <name>mapreduce.outputformat.class</name> <value>org.apache.hadoop.mapreduce.lib.output.TextOutputFormat</value> </property> <property> <name>mapred.job.queue.name</name> <value>${queueName}</value> </property> <property> <name>mapreduce.map.class</name> <value>org.apache.oozie.example.SampleMapper</value> </property> <property> <name>mapreduce.reduce.class</name> <value>org.apache.oozie.example.SampleReducer</value> </property> <property> <name>mapred.map.tasks</name> <value>1</value> </property> <property> <name>mapred.input.dir</name> <value>/user/${wf:user()}/${examplesRoot}/input-data/text</value> </property> <property> <name>mapred.output.dir</name> <value>/user/${wf:user()}/${examplesRoot}/output-data/${outputDir}</value> </property> </configuration> </map-reduce> <ok to="end"/> <error to="fail"/> </action> <kill name="fail"> <message>Map/Reduce failed, error message[${wf:errorMessage(wf:lastErrorNode())}]</message> </kill> <end name="end"/> </workflow-app>
hdfs dfs -put -f oozie-examples/src/main/apps/map-reduce/workflow.xml /user/root/examples/apps/map-reduce/
$ echo "Hello World Bye World" > file0 $ echo "Hello Hadoop Goodbye Hadoop" > file1 $ hdfs dfs -put file* /user/root/examples/input-data/text顺便把之前的data.txt删掉
hdfs dfs -rm /user/root/examples/input-data/text/data.txt
oozie job -oozie http://host1:11000/oozie -config oozie-examples/src/main/apps/map-reduce/job.properties -run
Bye 1 Goodbye 1 Hadoop 2 Hello 2 World 2
Alex 的 Hadoop 菜鸟教程: 第20课 工作流引擎 Oozie
原文地址:http://blog.csdn.net/nsrainbow/article/details/43746111