码迷,mamicode.com
首页 > 其他好文 > 详细

数据仓库 DWS层之用户行为宽表

时间:2020-07-02 16:41:55      阅读:137      评论:0      收藏:0      [点我收藏+]

标签:lse   导入   creat   external   变量   for   相关   black   临时表   

为什么需要用户行为宽表?把每个用户单日的行为聚合起来组成一张多列宽表,以便之后关联用户维度信息后,进行不同角度的统计分析。

数据来源:DWD层相关的业务数据表

创建用户行为宽表:

这张宽表整合了下单、支付和评论3种行为。

drop table if exists dws_user_action;
create external table dws_user_action 
(   
    user_id          string      comment 用户 id,
    order_count     bigint      comment 下单次数 ,
    order_amount    decimal(16,2)  comment 下单金额 ,
    payment_count   bigint      comment 支付次数,
    payment_amount  decimal(16,2) comment 支付金额 ,
    comment_count   bigint      comment 评论次数
) COMMENT 每日用户行为宽表
PARTITIONED BY (`dt` string)
stored as parquet
location /warehouse/gmall/dws/dws_user_action/
tblproperties ("parquet.compression"="snappy");

数据导入脚本:

with as基本语法为如下,作用是定义一个临时表,可以在后续的语句中多次使用,提高sql可读性。注意多个临时表之间用逗号,而最后一个临时表和查询语句之间没有符号。

WITH t1 AS (
        SELECT *
        FROM carinfo
    ), 
    t2 AS (
        SELECT *
        FROM car_blacklist
    )
SELECT *
FROM t1, t2

 

#!/bin/bash

# 定义变量方便修改
APP=gmall
hive=/opt/module/hive/bin/hive

# 如果是输入的日期按照取输入日期;如果没输入日期取当前时间的前一天
if [ -n "$1" ] ;then
    do_date=$1
else 
    do_date=`date -d "-1 day" +%F`  
fi 

sql="

with  
tmp_order as
(
    select 
        user_id, 
        sum(oi.total_amount) order_amount, 
        count(*)  order_count
    from "$APP".dwd_order_info  oi
    where date_format(oi.create_time,yyyy-MM-dd)=$do_date
    group by user_id
)  ,
tmp_payment as
(
    select 
        user_id, 
        sum(pi.total_amount) payment_amount, 
        count(*) payment_count 
    from "$APP".dwd_payment_info pi 
    where date_format(pi.payment_time,yyyy-MM-dd)=$do_date
    group by user_id
),
tmp_comment as
(  
    select  
        user_id, 
        count(*) comment_count
    from "$APP".dwd_comment_log c
    where date_format(c.dt,yyyy-MM-dd)=$do_date
    group by user_id 
)

Insert overwrite table "$APP".dws_user_action partition(dt=$do_date)
select 
    user_actions.user_id, 
    sum(user_actions.order_count), 
    sum(user_actions.order_amount),
    sum(user_actions.payment_count), 
    sum(user_actions.payment_amount),
    sum(user_actions.comment_count) 
from
(
    select
        user_id,
        order_count,
        order_amount,
        0 payment_count,
        0 payment_amount,
        0 comment_count
    from tmp_order

    union all
    select
        user_id,
        0,
        0,
        payment_count,
        payment_amount,
        0
    from tmp_payment

    union all
    select
        user_id,
        0,
        0,
        0,
        0,
        comment_count 
    from tmp_comment
 ) user_actions
group by user_id;
"

$hive -e "$sql"

 

数据仓库 DWS层之用户行为宽表

标签:lse   导入   creat   external   变量   for   相关   black   临时表   

原文地址:https://www.cnblogs.com/noyouth/p/13225215.html

(0)
(0)
   
举报
评论 一句话评论(0
登录后才能评论!
© 2014 mamicode.com 版权所有  联系我们:gaon5@hotmail.com
迷上了代码!