1006-hive的自定义UDF函数

时间：2015-06-01 06:13:33 阅读：163 评论：0 收藏：0 [点我收藏+]

标签：

hive可以任意定义一个函数，然后该函数放到hive的classpath下，在进入hive中，使用该命令操作

场景：设中国移动的用户在商城上下单，下单记录的主要字段包括

订单号手机号码商品编码商品数量渠道

10000 18810637891 bm0001 1 0001

10001 18710637891 bm0002 2 0002

10002 18710637891 bm0001 1 0001

10003 18610637891 bm0002 2 0003

10004 18610637891 bm0002 5 0001

10005 18610637891 bm0004 2 0005

已知：如何根据手机号得知该用户对应的省份。假设规则如下（即：手机号前三位和省份的对应的关系）

188 北京（bj）

187 上海（sh）

186 河北（hb）

对上面的记录在手机号的后面加上省份字段，然后把其余字段输出并上传hdfs上

解决方案：

1、下单记录日志存放在日志文件jforder.log中

[hadoop@cloud01 sbin]$ hadoop fs -mkdir /external
[hadoop@cloud01 sbin]$ hadoop fs -mkdir /external/hive

hive> create external table jf_order (orderNo string ,mobileNo string,wareCode string,amount int ,channel string)
    > row format delimited
    > fields terminated by ‘\t‘
    > location ‘/external/hive‘;

[hadoop@cloud01 ~]$ more jforder.log
10000     18810637891     bm0001     1     0001
10001     18710637891     bm0002     2     0002
10002     18710637891     bm0001     1     0001
10003     18610637891     bm0002     2     0003
10004     18610637891     bm0002     5     0001
10005     18610637891     bm0004     2     0005
[hadoop@cloud01 ~]$ hadoop fs -put jforder.log /external/hive

2、编写hive的函数areaFunction，作用根据手机号获取对应的用户省份

3、启动eclipse，编写udf函数

/**

* @function：定义省份转换函数，加载hive的 classpath下，可以直接使用

* @author shenfl

* @version:1.0

* @date： 2015 -5- 31

public class AreaFunction extends UDF {

/*定义静态变量*/

public static Map<String, String> map = new HashMap<String,String >();

/*可扩展数据数据库获取*/

static{

map .put("188" ,"北京" );

map .put("187" , "上海" );

map .put("186" ,"河北" );

}

public String evaluate(String mobileNumber ){

return mobileNumber + "\t" + map .get(mobileNumber .substring(0, 3));

}

public static void main(String [] args ) {

AreaFunction area = new AreaFunction();

String mobileNumber = "18810635789" ;

String rs = area.evaluate( mobileNumber );

System. out .println(rs );

}

4、上传编写的函数，把编写好的函数放到hive的classpath下

hive>add jar /home/hadoop/hiveudf.jar;

hive>create temporary function area2 as ‘com.hive.AreaFunction‘;

5、验证

hive> select orderNo ,area2(mobileno),wareCode,amount ,channel from jf_order;

6、强化后的结果存储到hdfs上

hive> insert overwrite directory ‘/hiveout‘ select orderNo ,area2(mobileno),wareCode,amount ,channel from jf_order;

1006-hive的自定义UDF函数

标签：

原文地址：http://blog.csdn.net/shenfuli/article/details/46295217

踩

(0)

评论一句话评论（0）

分享档案

更多>

2021年07月29日 (22)
2021年07月28日 (40)
2021年07月27日 (32)
2021年07月26日 (79)
2021年07月23日 (29)
2021年07月22日 (30)
2021年07月21日 (42)
2021年07月20日 (16)
2021年07月19日 (90)
2021年07月16日 (35)

周排行