- CM和CDH版本:5.13.1
- 集群未启用Kerberos
1.环境准备
- 作业运行的jar包上传到HDFS目录
sudo -u faysontest hadoop fs -mkdir -p /faysontest/jars sudo -u faysontest hadoop fs -put /opt/cloudera/parcels/CDH/jars/hadoop-mapreduce-examples-2.6.0-cdh5.13.1.jar /faysontest/jars sudo -u faysontest hadoop fs -ls /faysontest/jars
- Java Action的workflow.xml文件:
- workflow.xml文件中使用的参数配置为动态参数
Action failed, error message[${wf:errorMessage(wf:lastErrorNode())}] ${jobTracker} ${nameNode} ${mainClass} ${javaOpts} ${arg1} ${arg2}
- 将定义好的workflow.xml文件上传至HDFS的/user/faysontest/oozie/javaaction目录
[root@ip-186-31-6-148 opt]# sudo -u faysontest hadoop fs -mkdir -p /user/faysontest/oozie/javaaction [root@ip-186-31-6-148 opt]# sudo -u faysontest hadoop fs -put /opt/workflow.xml /user/faysontest/oozie/javaaction [root@ip-186-31-6-148 opt]# sudo -u faysontest hadoop fs -ls /user/faysontest/oozie/javaaction
2.创建Maven
- Maven创建Java工程
- pom.xml文件内容
cdh-project com.cloudera 1.0-SNAPSHOT 4.0.0 oozie-demojar oozie-demo http://maven.apache.org UTF-8 org.apache.httpcomponents httpclient4.5.4 net.sourceforge.spnego spnego7.0 org.apache.oozie oozie-client4.1.0
3.编写Oozie
- 编写JavaWorkflowDemo.java,如下
package com.cloudera.nokerberos; import org.apache.oozie.client.OozieClient; import org.apache.oozie.client.WorkflowAction; import org.apache.oozie.client.WorkflowJob; import java.util.List; import java.util.Properties; public class JavaWorkflowDemo { private static String oozieURL = "http://ip-186-31-6-148.fayson.com:11000/oozie"; public static void main(String[] args) { System.setProperty("user.name", "faysontest"); OozieClient oozieClient = new OozieClient(oozieURL); try { System.out.println(oozieClient.getServerBuildVersion()); Properties properties = oozieClient.createConfiguration(); properties.put("oozie.wf.application.path", "${nameNode}/user/faysontest/oozie/javaaction"); properties.put("oozie.use.system.libpath", "True"); properties.put("nameNode", "hdfs://ip-186-31-10-118.fayson.com:8020"); properties.put("jobTracker", "ip-186-31-6-148.fayson.com:8032"); properties.put("mainClass", "org.apache.hadoop.examples.QuasiMonteCarlo"); properties.put("arg1", "10"); properties.put("arg2", "10"); properties.put("javaOpts", "-Xmx1000m"); properties.put("oozie.libpath", "${nameNode}/faysontest/jars/"); //运行workflow String jobid = oozieClient.run(properties); System.out.println(jobid); //等待10s new Thread(){ public void run() { try { Thread.sleep(10000l); } catch (InterruptedException e) { e.printStackTrace(); } } }.start(); //根据workflow id获取作业运行情况 WorkflowJob workflowJob = oozieClient.getJobInfo(jobid); //获取作业日志 System.out.println(oozieClient.getJobLog(jobid)); //获取workflow中所有ACTION Listlist = workflowJob.getActions(); for (WorkflowAction action : list) { //输出每个Action的 Appid 即Yarn的Application ID System.out.println(action.getExternalId()); } } catch (Exception e) { e.printStackTrace(); } } }
总结
- 需要先定义好workflow.xml文件
- 参数传递通过在代码里面调用oozieClient.createConfiguration()创建一个Properties对象将K,V值存储并传入oozieClient.run(properties)中
- 在指定HDFS上运行的jar或workflow的路径时需要带上HDFS的路径,否则默认会找到本地的目录
大数据视频推荐:
CSDN
大数据语音推荐:
企业级大数据技术应用
大数据机器学习案例之推荐系统
自然语言处理
大数据基础
人工智能:深度学习入门到精通
欢迎分享,转载请注明来源:内存溢出
评论列表(0条)