标签:rdl inspector dd命令 gen convert 实现 start isp strong
引入maven依赖
<dependency> <groupId>org.apache.hive</groupId> <artifactId>hive-exec</artifactId> <version>2.3.0</version> </dependency>
实现抽象类GenericUDF
该类的全路径为:org.apache.hadoop.hive.ql.udf.generic.GenericUDF
1)抽象类GenericUDF解释
GenericUDF类如下:
public abstract class GenericUDF implements Closeable { ... /* 实例化后initialize方法只会调用一次 - 参数arguments即udf接收的参数列表对应的objectinspector - 返回的ObjectInspector对象就是udf返回值的对应的objectinspector initialize方法中往往做的工作是检查一下arguments是否和你udf需要的参数个数以及类型是否匹配。 */ public abstract ObjectInspector initialize(ObjectInspector[] arguments) throws UDFArgumentException; ... // 真正的udf逻辑在这里实现 // - 参数arguments即udf函数输入数据,这个数组的长度和initialize的参数长度一样 // public abstract Object evaluate(DeferredObject[] arguments) throws HiveException; }
关于ObjectInspector,HIVE在传递数据时会包含数据本身以及对应的ObjectInspector,ObjectInspector中包含数据类型信息,通过oi去解析获得数据。
2) 实例
public class DateFeaker extends GenericUDF{ private static final SimpleDateFormat sdf = new SimpleDateFormat("yyyy-MM-dd"); private transient ObjectInspectorConverters.Converter[] converters; @Override public ObjectInspector initialize(ObjectInspector[] arguments) throws UDFArgumentException { if (arguments.length != 2) { throw new UDFArgumentLengthException( "The function date_util(startdate,enddate) takes exactly 2 arguments."); } converters = new ObjectInspectorConverters.Converter[arguments.length]; for (int i = 0; i < arguments.length; i++) { converters[i] = ObjectInspectorConverters.getConverter(arguments[i], PrimitiveObjectInspectorFactory.writableStringObjectInspector); } return ObjectInspectorFactory .getStandardListObjectInspector(PrimitiveObjectInspectorFactory .writableStringObjectInspector); } @Override public Object evaluate(DeferredObject[] arguments) throws HiveException { if (arguments.length != 2) { throw new UDFArgumentLengthException( "The function date_util(startdate,enddate) takes exactly 2 arguments."); } ArrayList<Text> temp = new ArrayList<Text>(); if (arguments[0].get() == null || arguments[1].get() == null) { return null; } System.out.println(converters[0].getClass().getName()); System.out.println(arguments[0].getClass().getName()); Text startDate = (Text) converters[0].convert(arguments[0].get()); Text endDate = (Text) converters[1].convert(arguments[1].get()); Date start; try { start = sdf.parse(startDate.toString()); } catch (ParseException e) { e.printStackTrace(); throw new UDFArgumentException( "The First Argument does not match the parttern yyyy-MM-dd "+arguments[0].get()); } Date end; try { end = sdf.parse(endDate.toString()); } catch (ParseException e) { e.printStackTrace(); throw new UDFArgumentException( "The Second Argument does not match the parttern yyyy-MM-dd "+arguments[1].get()); } Calendar c = Calendar.getInstance(); while(start.getTime()<=end.getTime()){ temp.add(new Text(sdf.format(start))); c.setTime(start); c.add(Calendar.DATE, 1); start = c.getTime(); } return temp; } @Override public String getDisplayString(String[] children) { assert (children.length == 2); return getStandardDisplayString("date_util", children); }
3)部署UDF函数
[hadoop@hadoop001 lib]$hadoop fs -put g6-hadoop-1.0.jar hdfs://chd:8020/user
hive> add jar /home/hadoop/data/hive/g6-hadoop-1.0.jar;
1)创建临时函数 -----只对当前黑窗口有效
CREATE TEMPORARY FUNCTION function_name AS class_name;
function_name函数名
*******class_name 类路径,包名+类名********* 这里就是你写的UDF函数的第一行的package后边的东西然后在加个点加个类的名字
CREATE TEMPORARY FUNCTION function_name AS class_name USING JAR path; function_name函数名 class_name 类路径, 包名+类名 path jar包hdfs路径
实例
CREATE FUNCTION HelloUDF AS ‘org.apache.hadoop.hive.ql.udf.HelloUDF‘ USING JAR ‘hdfs://hadoop001:9000/lib/g6-hadoop-1.0.jar‘; #测试 hive> select HelloUDF("17") ; OK hello:17
4)推荐比较全的实例
git地址:https://github.com/tchqiq/HiveUDF/tree/master/src/main/java/cn/com/diditaxi/hive/cf
标签:rdl inspector dd命令 gen convert 实现 start isp strong
原文地址:https://www.cnblogs.com/yyy-blog/p/13601906.html