码迷,mamicode.com
首页 > 其他好文 > 详细

Hive 蚂蚁森林案例

时间:2020-07-14 00:35:41      阅读:817      评论:0      收藏:0      [点我收藏+]

标签:rmi   where   mit   _id   案例   esc   into   replace   设置   

蚂蚁森林案例背景说明

  • 原始数据样例
    user_low_carbon.txt 记录用户每天的蚂蚁森林低碳生活领取的流水
数据样例
u_001   2017/1/1    10
u_001   2017/1/2    150
u_001   2017/1/2    110

plant_carbon.txt 记录申领环保植物所需要减少的碳排放量

数据样例
p001    梭梭树 17
p002    沙柳  19
p003    樟子树 146
p004    胡杨  215
  • 以上原始数据样例建表格式如下
表名:user_low_carbon
字段说明
user_id:用户
data_dt:日期
low_carbon:减少碳排放(g)
表名:plant_carbon
字段说明
plant_id:植物编号
plant_name:植物名
low_carbon:换购植物所需要的碳

创建表

hive (default)> create table user_low_carbon(user_id String,
data_dt String,
low_carbon int
) 
row format delimited fields terminated by ‘\t‘;

导入数据

load data local inpath "/opt/module/data/user_low_carbon.txt" into table user_low_carbon;
load data local inpath "/opt/module/data/plant_carbon.txt" into table plant_carbon;

设置本地模式

hive (default)> set hive.exec.mode.local.auto=true;

1 需求一:蚂蚁森林植物申领统计

假设2017年1月1日开始记录低碳数据(user_low_carbon),假设2017年10月1日之前满足申领条件的用户都申领了一颗p004-胡杨,剩余的能量全部用来领取“p002-沙柳” 。统计在10月1日累计申领“p002-沙柳” 排名前10的用户信息;以及他比后一名多领了几颗沙柳。

1.1 step1 统计每个用户截止到2017-10-01日之前收集的总碳量

hive (default)> select user_id, sum(low_carbon) sum_carbon
from user_low_carbon
where date_format(regexp_replace(data_dt, ‘/‘, ‘-‘), ‘yyyy-MM-dd‘) < ‘2017-10-01‘
group by user_id;

输出结果:

user_id sum_carbon
u_001   475
u_002   659
u_003   620
u_004   640
u_005   1100
u_006   830
u_007   1470
u_008   1240
u_009   930
u_010   1080
u_011   960
u_012   250
u_013   1430
u_014   1060
u_015   290

1.2 step2 获取胡杨和沙柳的能量

select low_carbon from plant_carbon where plant_id=‘004‘;
select low_carbon from plant_carbon where plant_id=‘002‘;

1.3 step3 计算每个用户申领沙柳的棵数

hive (default)> select user_id,
       floor((t1.sum_carbon - t2.low_carbon) / t3.low_carbon) count_p002
from (
         select user_id, sum(low_carbon) sum_carbon
         from user_low_carbon
         where date_format(regexp_replace(data_dt, ‘/‘, ‘-‘), ‘yyyy-MM-dd‘) < ‘2017-10-01‘
         group by user_id
     ) t1,
     (
         select low_carbon
         from plant_carbon
         where plant_id = ‘p004‘
     ) t2,
     (
         select low_carbon
         from plant_carbon
         where plant_id = ‘p002‘
     ) t3;

输出结果:

user_id count_p002
u_001   13
u_002   23
u_003   21
u_004   22
u_005   46
u_006   32
u_007   66
u_008   53
u_009   37
u_010   45
u_011   39
u_012   1
u_013   63
u_014   44
u_015   3

1.4 step4 按照每个人领取的沙柳棵数倒序排序,并获取当前记录的下一条记录所领取的沙柳的棵数

统计在10月1日累计申领“p002-沙柳” 排名前10的用户信息

hive (default)> select user_id,
       count_p002,
       lead(count_p002, 1) over (order by count_p002 desc) lead_1_p002
from (
         select user_id,
                floor((t1.sum_carbon - t2.low_carbon) / t3.low_carbon) count_p002
         from (
                  select user_id, sum(low_carbon) sum_carbon
                  from user_low_carbon
                  where date_format(regexp_replace(data_dt, ‘/‘, ‘-‘), ‘yyyy-MM-dd‘) < ‘2017-10-01‘
                  group by user_id
              ) t1,
              (
                  select low_carbon
                  from plant_carbon
                  where plant_id = ‘p004‘
              ) t2,
              (
                  select low_carbon
                  from plant_carbon
                  where plant_id = ‘p002‘
              ) t3
     ) t4
limit 10;

输出结果:

user_id count_p002      lead_1_p002
u_007   66      63
u_013   63      53
u_008   53      46
u_005   46      45
u_010   45      44
u_014   44      39
u_011   39      37
u_009   37      32
u_006   32      23
u_002   23      22

1.5 step5 统计当前用户他比后一名多领了几颗沙柳

hive (default)> select user_id,
       count_p002,
       (count_p002 - lead_1_p002) diff_count
from (
         select user_id,
                count_p002,
                lead(count_p002, 1) over (order by count_p002 desc) lead_1_p002
         from (
                  select user_id,
                         floor((t1.sum_carbon - t2.low_carbon) / t3.low_carbon) count_p002
                  from (
                           select user_id, sum(low_carbon) sum_carbon
                           from user_low_carbon
                           where date_format(regexp_replace(data_dt, ‘/‘, ‘-‘), ‘yyyy-MM-dd‘) < ‘2017-10-01‘
                           group by user_id
                       ) t1,
                       (
                           select low_carbon
                           from plant_carbon
                           where plant_id = ‘p004‘
                       ) t2,
                       (
                           select low_carbon
                           from plant_carbon
                           where plant_id = ‘p002‘
                       ) t3
              ) t4
         limit 10
     ) t5
order by count_p002 desc;

输出结果:

user_id count_p002      diff_count
u_007   66      3
u_013   63      10
u_008   53      7
u_005   46      1
u_010   45      1
u_014   44      5
u_011   39      2
u_009   37      5
u_006   32      9
u_002   23      1

Hive 蚂蚁森林案例

标签:rmi   where   mit   _id   案例   esc   into   replace   设置   

原文地址:https://www.cnblogs.com/eugene0/p/13296706.html

(0)
(0)
   
举报
评论 一句话评论(0
登录后才能评论!
© 2014 mamicode.com 版权所有  联系我们:gaon5@hotmail.com
迷上了代码!