Hive的UDF实现及注意事项

时间：2016-03-21 19:55:39 阅读：114 评论：0 收藏：0 [点我收藏+]

标签：

Hive自身查询语言HQL能完毕大部分的功能，但遇到特殊需求时，须要自己写UDF实现。下面是一个完整的案例。

1、eclipse中编写UDF

①项目中增加hive的lib下的全部jar包和Hadoop中share下hadoop-common-2.5.1.jar（Hadoop眼下最新版本号2.5.1）。
②UDF类要继承org.apache.hadoop.hive.ql.exec.UDF类。类中要实现evaluate。

当我们在hive中使用自己定义的UDF的时候，hive会调用类中的evaluate方法来实现特定的功能
③导出项目为jar文件。
注：项目的jdk与集群的jdk要一致。

详细样例：

package com.zx.hive.udf;

import org.apache.hadoop.hive.ql.exec.UDF;

public class UdfTestLength extends UDF{

    public Integer evaluate(String s)
    {
        if(s==null)
        {
            return null;
        }else{
            return s.length();
        }
    }
}

将上面的类打成jar的形式，我使用eclipse直接导出为test-udf.jar包。然后放在/root文件夹中。

2、自己定义函数调用过程：

①加入jar包（在hive命令行里面运行）
hive> add jar /root/test-udf.jar;

②创建暂时函数，hive命令行关闭后，即失效。
hive> create temporary function testlength as ‘com.zx.hive.udf.UdfTestLength‘;

③调用
hive> select id, name, testlength(name) from student;

④将查询结果保存到HDFS中

hive> create table result row format delimited fields terminated by ‘\t‘ as select id,testlength(nation) from student;

（转载请注明，很多其它内容见：http://blog.csdn.net/hwwn2009/article/details/41289197）

3、遇到的问题：

①须要引用第三方包，有两种方式：

1）在执行hive hql时，手动将udf所须要的jar包通过add语句加入：add jar /root/***.jar（測试通过）。
2）安装eclipse 插件：fatjar （測试通过）

在线安装fatjar：
eclipse菜单条 help >software updates >Search for new features to install>new update site>
填写name 和url
name：随意起个，就写fat吧
url:这个是fat jar的地址输入http://kurucz-grafika.de/fatjar

使用fatjar打包的方法：

技术分享