码迷,mamicode.com
首页 > 其他好文 > 详细

HIVE-----count(distinct ) over() 无法使用解决办法

时间:2020-06-11 14:58:24      阅读:200      评论:0      收藏:0      [点我收藏+]

标签:合并   anti   图片   must   lin   info   err   class   end   

HIVE-----count(distinct ) over() 无法使用解决办法

在使用hive时发现count(distinct ) over()  报错

hive> with da as (
    > select 1 a, ‘a‘ b union all
    > select 1 a, ‘a‘ b union all
    > select 2 a, ‘a‘ b union all
    > select 2 a, ‘a‘ b union all
    > select 2 a, ‘a‘ b union all
    > select 3 a, ‘b‘ b union all
    > select 3 a, ‘b‘ b union all
    > select 3 a, ‘b‘ b union all
    > select 3 a, ‘b‘ b union all
    > select 3 a, ‘b‘ b union all
    > select 3 a, ‘b‘ b union all
    > select 3 a, ‘b‘ b
    > )
    > select
    > a
    > ,b
    > ,sum(a) over(partition by b) 
    > , count(distinct a) over(partition by b) 
    > from da;
FAILED: SemanticException Failed to breakup Windowing invocations into Groups. At least 1 group must only depend on input columns. Also check for circular dependencies.
Underlying error: org.apache.hadoop.hive.ql.parse.SemanticException: Line 18:26 Expression not in GROUP BY key ‘b‘

   经过测试将

with da as (
select 1 a, ‘a‘ b union all
select 1 a, ‘a‘ b union all
select 2 a, ‘a‘ b union all
select 2 a, ‘a‘ b union all
select 2 a, ‘a‘ b union all
select 3 a, ‘b‘ b union all
select 3 a, ‘b‘ b union all
select 3 a, ‘b‘ b union all
select 3 a, ‘b‘ b union all
select 3 a, ‘b‘ b union all
select 3 a, ‘b‘ b union all
select 3 a, ‘b‘ b
)
select

 count(distinct a) over(partition by b) 
from da

  当且仅当至于count(distinct ) over()一个时段时能够使用,原因可能时内部实现distinct出错 不知道是否和版本有关 使用版本为Hive version 1.1.0

技术图片

解决办法:如下使用collect_set(a) over(partition by b)函数将合并成一个分好组的集合 然后求出集合的值个数

因为collect_set()不能放入重复函数所以使用size()求集合元素数量时能达到count(distinct )的效果

with da as (
select 1 a, ‘a‘ b union all
select 1 a, ‘a‘ b union all
select 2 a, ‘a‘ b union all
select 2 a, ‘a‘ b union all
select 2 a, ‘a‘ b union all
select 3 a, ‘b‘ b union all
select 3 a, ‘b‘ b union all
select 3 a, ‘b‘ b union all
select 3 a, ‘b‘ b union all
select 3 a, ‘b‘ b union all
select 3 a, ‘b‘ b union all
select 3 a, ‘b‘ b
)
select
a
,b
,sum(a) over(partition by b) 
,size(collect_set(a) over(partition by b))
from da

  结果技术图片

HIVE-----count(distinct ) over() 无法使用解决办法

标签:合并   anti   图片   must   lin   info   err   class   end   

原文地址:https://www.cnblogs.com/luckyfruit/p/13093203.html

(0)
(0)
   
举报
评论 一句话评论(0
登录后才能评论!
© 2014 mamicode.com 版权所有  联系我们:gaon5@hotmail.com
迷上了代码!