二)实战演示如何在hive中使用动态分区
1、创建一张分区表,包含两个分区dt和ht表示日期和小时
CREATE TABLE partition_table001 ( name STRING, ip STRING ) PARTITIONED BY (dt STRING, ht STRING) ROW FORMAT DELIMITED FIELDS TERMINATED BY "\t";
2、启用hive动态分区,只需要在hive会话中设置两个参数:
set hive.exec.dynamic.partition=true; set hive.exec.dynamic.partition.mode=nonstrict;3、把partition_table001表某个日期分区下的数据load到目标表partition_table002
create table if not exists partition_table002 like partition_table001; insert overwrite table partition_table002 partition (dt='20150617', ht='00') select name, ip from partition_table001 where dt='20150617' and ht='00';
此时我们发现一个问题,如果希望插入每天24小时的数据,则需要执行24次上面的语句。而动态分区会根据select出的结果自动判断数据改load到哪个分区中去。
4、使用动态分区
insert overwrite table partition_table002 partition (dt, ht) select * from partition_table001 where dt='20150617';hive先获取select的最后两个位置的dt和ht参数值,然后将这两个值填写到insert语句partition中的两个dt和ht变量中,即动态分区是通过位置来对应分区值的。原始表select出来的值和输出partition的值的关系仅仅是通过位置来确定的,和名字并没有关系,比如这里dt和st的名称完全没有关系。
INSERT OVERWRITE TABLE T PARTITION (ds, hr) SELECT key, value, ds, hr FROM srcpart WHERE ds is not null and hr>10;2、DP/SP结合
INSERT OVERWRITE TABLE T PARTITION (ds='2010-03-03', hr) SELECT key, value, /*ds,*/ hr FROM srcpart WHERE ds is not null and hr>10;3、当SP是DP的子分区时,以下DML会报错,因为分区顺序决定了HDFS中目录的继承关系,这点是无法改变的
-- throw an exception INSERT OVERWRITE TABLE T PARTITION (ds, hr = 11) SELECT key, value, ds/*, hr*/ FROM srcpart WHERE ds is not null and hr=11;4、多张表插入
FROM S INSERT OVERWRITE TABLE T PARTITION (ds='2010-03-03', hr) SELECT key, value, ds, hr FROM srcpart WHERE ds is not null and hr>10 INSERT OVERWRITE TABLE R PARTITION (ds='2010-03-03, hr=12) SELECT key, value, ds, hr from srcpart where ds is not null and hr = 12;5、CTAS,(CREATE-AS语句),DP与SP下的CTAS语法稍有不同,因为目标表的schema无法完全的从select语句传递过去。这时需要在create语句中指定partition列
CREATE TABLE T (key int, value string) PARTITIONED BY (ds string, hr int) AS SELECT key, value, ds, hr+1 hr1 FROM srcpart WHERE ds is not null and hr>10;6、上面展示了DP下的CTAS用法,如果希望在partition列上加一些自己的常量,可以这样做
CREATE TABLE T (key int, value string) PARTITIONED BY (ds string, hr int) AS SELECT key, value, "2010-03-03", hr+1 hr1 FROM srcpart WHERE ds is not null and hr>10;
原文地址:http://blog.csdn.net/opensure/article/details/46537969