hive一些函数总结

时间：2016-01-30 13:29:36 阅读：165 评论：0 收藏：0 [点我收藏+]

标签：

函数分类
内置函数：简单函数(map)，聚合函数(reduce)，集合函数(map)，特殊函数
正则表达式
自定义函数:UDF(map),UDAF(reduce)

1.显示当前会话有多少函数可用
show functions
2.显示函数的描述信息
desc function concat
3.显示函数的扩展描述信息
desc function extended concat;

简单函数
函数的计算粒度-单条记录
关系运算
数学运算
逻辑运算
数值计算
类型转换
日期函数
条件函数
字符串函数
统计函数

聚合函数
函数处理的数据粒度-多条记录
sum()-求和
count()-求数据量
avg()-求平均数
distinct-求不同值数
min-求最小值
max-求最大值

集合函数
复合类型构建
复杂类型访问
复杂类型长度

窗口函数
应用场景
用于分区排序
动态Group by
Top N
累计计算
层次查询
Windowing functions
lead
lag
FIRST_VALUE
LAST_VLAUE

and or 优先级：and 优先级高于or
select id,money from winfunc
where (id =‘1001‘ or id =‘1002‘)
and money =‘100‘

类型转换

cast(money as int)

if(boolean,v1,v2)
select if(2>1,"trueValue","falseValue") from dual;

case when condition then int when then int else int end

返回类型必须一样
select case when id=‘1001‘ then ‘V1‘ when id=‘1002‘ then ‘V2‘
else ‘V3‘ end from tableName;

取json属性值
get_json_object()

select get_json_object(‘{"name":"jack","age":"20"}‘,‘$.name‘)
from tableName ;

select parse_url(‘http://baidu.com/path1/p.php?k1=v1&k2=v2#Ref1‘,‘HOST‘)
from tableName;//baidu.com

concat() 字符串拼接
select concat(type,‘123‘) from tableName;

caoncat_ws() 带分隔符的字符串连接
caoncat_ws(‘‘,array<string>)
select caoncat_ws(‘-‘,type,‘123‘) from tableName;

select caoncat_ws(‘-‘,split(type,‘‘)) from tableName;

collect_set() 把一列收集起来，去重,返回数组

select collect_set(id) from tableName;

collect_list() 把一列收集起来，不去重,返回数组

select collect_list(id) from tableName;

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
窗口函数
取每个员工第一个最大money的值
first_value(money) over (partition by id order by money desc
rows between 1 preceding and 1 following )

1 preceding
当前行
1 following

第三列的值三行中取最小值
1001 100.0 100.0
1001 150.0 100.0
1001 150.0 150.0
1001 200.0 150.0

当前行指定向后范围的数据
lead(money,2) over (order by money)
1001 100.0 150.0
1001 150.0 200.0
1001 150.0 NULL
1001 200.0 NULL
当前行指定向前范围的数据
lag(money,2) over (order by money)
1001 100.0 NULL
1001 150.0 NULL
1001 150.0 100.0
1001 200.0 150.0

排序,相当于行号，相同记录一样，否则为自己当前记录行号
rank() Over(partition by id order by money )
select id,money,rank() over(partition by id order by money ) from winfulc;
1001 100.0 1
1001 150.0 2
1001 150.0 2
1001 200.0 4

dense_rank() 行号在你前面最大行号加1
1001 100.0 1
1001 150.0 2
1001 150.0 2
1001 200.0 3

查看当前值在里面的比例
cume_dist() over (partition by id order by money)
((相同值最大行号)/(行号（第几行）))*每个值的个数
1001 100.0 0.25 (4/1)*1
1001 150.0 0.75 (4/3)*2
1001 150.0 0.75 (4/3)*2
1001 200.0 1.0 (4/4)*1

percent_rank() over (partition by id order by money)
((相同值最小行号-1)/(行数-1))
1001 100.0 0.0
1001 150.0 0.3333
1001 150.0 0.3333
1001 200.0 1.0

把所有的数据分两份
ntitle(2) over (partition by id order by money desc nulls last) 分片

select id,money,ntile(2) over (order by money desc) from winfunc;

混合函数
java_method
reflect
hash

开平方
select java_method("java.lang.Math","sqrt",cast(id as double))
from winfunc;

udft 一行分成多行

select id,adid from winfunc
lateral view explode (split(type,‘B‘)) tt
as adid

hive一些函数总结

标签：

原文地址：http://www.cnblogs.com/thinkpad/p/5170700.html

踩

(0)

评论一句话评论（0）

分享档案

更多>

2021年07月29日 (22)
2021年07月28日 (40)
2021年07月27日 (32)
2021年07月26日 (79)
2021年07月23日 (29)
2021年07月22日 (30)
2021年07月21日 (42)
2021年07月20日 (16)
2021年07月19日 (90)
2021年07月16日 (35)

周排行