标签:
mobile_log记录移动日志,现在需要将其保存到hive表中,将来可以按日期,小时进行统计,为此,需要建立一张具有日期、小时的hive分区表。
hive分区表分为单分区表和多分区表,一个表可以拥有多个分区,每个分区都以文件夹的形式单独存放在表的文件目录下,详细可以参见Hive LanguageManual DDL
建立多分区表代码
drop table if exists pms.test_mobile_log;
create table pms.test_mobile_log
(
id bigint,
infomation string
)
partitioned by (ds string, hour string)
row format delimited fields terminated by ‘\t‘
lines terminated by ‘\n‘;
导入数据到多分区表中,实现方式有如下这些:
drop table if exists pms.test_mobile_log;
create table pms.test_mobile_log
(
id bigint,
infomation string
)
partitioned by (ds string, hour string)
row format delimited fields terminated by ‘\t‘
lines terminated by ‘\n‘;
insert overwrite table pms.test_mobile_log partition(ds=‘2015-05-26‘, hour=‘13‘)
select
id,
category_name
from category;
LOAD DATA
方式导入数据,参考load data inpath ‘/user/pms/workspace/ouyangyewei/temp2/category.txt‘ overwrite into table pms.test_mobile_log partition (ds=‘2015-05-26‘, hour=‘15‘);
alter table pms.test_mobile_log add partition (ds=‘2015-05-27‘, hour=‘14‘) location ‘/user/pms/workspace/ouyangyewei/temp2/category.txt‘;
表结构
CREATE TABLE pms.test_mobile_log(
id bigint,
infomation string)
PARTITIONED BY (
ds string,
hour string)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY ‘\t‘
LINES TERMINATED BY ‘\n‘
STORED AS INPUTFORMAT
‘org.apache.hadoop.mapred.TextInputFormat‘
OUTPUTFORMAT
‘org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat‘
LOCATION
‘hdfs://yhd-jqhadoop2.int.yihaodian.com:8020/user/hive/pms/test_mobile‘
TBLPROPERTIES (
‘numPartitions‘=‘2‘,
‘numFiles‘=‘2‘,
‘transient_lastDdlTime‘=‘1432711793‘,
‘numRows‘=‘0‘,
‘totalSize‘=‘3517‘,
‘rawDataSize‘=‘0‘)
表分区
$hadoop fs -ls /user/hive/pms/test_mobile_log/ds=2015-05-26
Found 2 items
drwxr-xr-x - pms supergroup 0 2015-05-27 13:53 /user/hive/pms/test_mobile_log/ds=2015-05-26/hour=13
drwxr-xr-x - pms pms 0 2015-05-27 15:29 /user/hive/pms/test_mobile_log/ds=2015-05-26/hour=15
标签:
原文地址:http://blog.csdn.net/yeweiouyang/article/details/46050999