标签:sub type arguments info bre agg apach over class
参考文章:
UDF:最简单的自定义,实现一对一,输入一行数据输出一行数据
UDAF:自定义聚合函数,实现多对一,输入多行数据输出一行数
UDTF:用来实现一行输入多行输出,这次先不讲
要点:1.UDF类需要继承org.apache.hadoop.hive.ql.exec.UDF.
2.UDF类需要实现evaluate类.
UDF开发实例:
开发一个udf getdate以返回当前系统时间
package udf.test;
import org.apache.hadoop.hive.ql.exec.UDF;
import java.text.SimpleDateFormat;
import java.util.Date;
public class Getdate extends UDF {
public String evaluate(){
return new SimpleDateFormat("yyyy-MM-dd HH:mm:ss").format(new Date());
}
}
然后maven打包:mvn clean compile.package
接着把包放到服务器上,比如放到/home/azkaban/UDF/udf-jar.1.1.0
进入hive shell,执行add jar /home/azkaban/UDF/udf-jar.1.1.0
接着执行create tempopary function getdate as ‘udf.test.Getdate‘;
这里的getdate就是function名称。在hive shell中执行select getdate()就会返回当前的系统时间。
待解决:hive中类似于bigint的类型,在udf的evaluate方法中如何返回,改成long?
其实hive就是对MapReduce的一层包装,所以我们写UDAF的时候可以通过对应到Map Reduce进行理解。
在四个阶段中,我们可以得知,需要实现7个方法
table:customers
name | gender | age |
---|---|---|
张三 | 男 | 23 |
李氏 | 男 | 26 |
王婆 | 女 | 54 |
尼古拉斯-赵六 | 男 | 43 |
select wm_concat(name) from customers;
返回的是 "张三,李氏,王婆,尼古拉斯-赵六"
package com.maihaoche.baiyan.UDF;
import org.apache.hadoop.hive.ql.exec.UDFArgumentException;
import org.apache.hadoop.hive.ql.metadata.HiveException;
import org.apache.hadoop.hive.ql.parse.SemanticException;
import org.apache.hadoop.hive.ql.udf.generic.AbstractGenericUDAFResolver;
import org.apache.hadoop.hive.ql.udf.generic.GenericUDAFEvaluator;
import org.apache.hadoop.hive.ql.udf.generic.GenericUDAFParameterInfo;
import org.apache.hadoop.hive.serde2.objectinspector.ObjectInspector;
import org.apache.hadoop.hive.serde2.objectinspector.PrimitiveObjectInspector;
import org.apache.hadoop.hive.serde2.objectinspector.primitive.PrimitiveObjectInspectorFactory;
import org.apache.hadoop.io.Text;
public class Wm_concat extends AbstractGenericUDAFResolver{
@Override
public GenericUDAFEvaluator getEvaluator(GenericUDAFParameterInfo info) throws SemanticException {
return new GenericUDAFWmconcatEvaluator();
}
public static class GenericUDAFWmconcatEvaluator extends GenericUDAFEvaluator{
static class stringagg implements AggregationBuffer{
StringBuffer stringBuffer=new StringBuffer();
String flag=null;
boolean empty;
}
@Override
/*
init方法不写的话会报nullpointexception null 的错误
*/
public ObjectInspector init(Mode m, ObjectInspector[] parameters) throws HiveException {
super.init(m, parameters);
if(parameters.length!=1){
throw new UDFArgumentException("Argument Exception");
}
return PrimitiveObjectInspectorFactory.writableStringObjectInspector;
}
/*
获取存放中间结果的对象
*/
public AggregationBuffer getNewAggregationBuffer() throws HiveException {
stringagg sa=new stringagg();
String str=null;
return sa;
}
public void reset(AggregationBuffer aggregationBuffer) throws HiveException {
stringagg sa=(stringagg)aggregationBuffer;
sa.empty=true;
sa.stringBuffer.delete(0,sa.stringBuffer.length());
}
public void iterate(AggregationBuffer aggregationBuffer, Object[] objects) throws HiveException {
if(objects.length !=1 ){
throw new UDFArgumentException("Argument Exception");
}
this.merge(aggregationBuffer,objects[0]);
}
public Object terminatePartial(AggregationBuffer aggregationBuffer) throws HiveException {
return this.terminate(aggregationBuffer);
}
public void merge(AggregationBuffer aggregationBuffer, Object o) throws HiveException {
stringagg sa=(stringagg)aggregationBuffer;
if(o!=null){
sa.stringBuffer.append(o.toString());
sa.empty=false;
}
}
public Object terminate(AggregationBuffer aggregationBuffer) throws HiveException {
stringagg sa=(stringagg)aggregationBuffer;
if(sa.empty==true) return null;
int length=sa.stringBuffer.toString().length();
return new Text(sa.stringBuffer.toString().substring(0,length-1));//通过substring解决最后一个字段跟着的分隔符
}
}
}
很明显,我们可以看出来,AbstractGenericUDAFResolver就是一层皮,我们可以在里面加一写验证条件,比如:
检测下面就进行检测是否有2个参数以及判断数据类型
public GenericUDAFEvaluator getEvaluator(GenericUDAFParameterInfo parameters) throws SemanticException {
if (parameters.length != 2) {
throw new UDFArgumentTypeException(parameters.length - 1,
"Please specify exactly two arguments.");
}
// validate the first parameter, which is the expression to compute over
if (parameters[0].getCategory() != ObjectInspector.Category.PRIMITIVE) {
throw new UDFArgumentTypeException(0,
"Only primitive type arguments are accepted but "
+ parameters[0].getTypeName() + " was passed as parameter 1.");
}
switch (((PrimitiveTypeInfo) parameters[0]).getPrimitiveCategory()) {
case BYTE:
case SHORT:
case INT:
case LONG:
case FLOAT:
case DOUBLE:
case TIMESTAMP:
case DECIMAL:
break;
case STRING:
case BOOLEAN:
case DATE:
default:
throw new UDFArgumentTypeException(0,
"Only numeric type arguments are accepted but "
+ parameters[0].getTypeName() + " was passed as parameter 1.");
}
待解决:如何写希望输入的是两个参数的,比如现在希望自己指定wm_concat的分割符。
标签:sub type arguments info bre agg apach over class
原文地址:https://www.cnblogs.com/WinseterCheng/p/9182155.html