码迷,mamicode.com
首页 > 其他好文 > 详细

[Hive]MapReduce将数据写入Hive分区表

时间:2015-04-02 18:58:51      阅读:614      评论:0      收藏:0      [点我收藏+]

标签:

业务需求:

将当天产生的数据写入Hive分区表中(以日期作为分区)

业务分析:

利用MapReduce将数据写入Hive表实则上就是将数据写入至Hive表的HDFS目录下,但是问题在于写入至当天的分区,因此问题转换为:如何事先创建Hive表的当天分区

解决方案:

1. 创建Hive表

# 先创建分区表rcmd_valid_path
hive -e "set mapred.job.queue.name=pms;

drop table if exists pms.test_rcmd_valid_path;
create table if not exists pms.test_rcmd_valid_path 
(
track_id string,
track_time string,
session_id string,
gu_id string,
end_user_id string,
page_category_id bigint,
algorithm_id int,
is_add_cart int,
rcmd_product_id bigint,
product_id bigint,
path_id string,
path_type string,
path_length int,
path_list string,
order_code string,
groupon_id bigint
)
partitioned by (ds string) 
ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t' 
LINES TERMINATED BY '\n';"
2. 创建表的date当天分区(若分区不存在则创建,若存在则覆盖)

# 创建正式表rcmd_valid_path表date当天的分区目录
hive -e "set mapred.job.queue.name=pms;

insert overwrite table pms.test_rcmd_valid_path partition(ds='$date')
select track_id,
track_time,
session_id,
gu_id,
end_user_id,
page_category_id,
algorithm_id,
is_add_cart,
rcmd_product_id,
product_id,
path_id,
path_type,
path_length,
path_list,
order_code,
groupon_id 
from pms.test_rcmd_valid_path where ds = '$date';" 
3. Job直接写入即可(留意job2OutputPath)

hadoop jar lib/bigdata-datamining-1.1-user-trace-jar-with-dependencies.jar com.yhd.datamining.data.usertrack.offline.job.mapred.TrackPathJob --similarBrandPath /user/pms/recsys/algorithm/schedule/warehouse/relation/brand/$yesterday --similarCategoryPath /user/pms/recsys/algorithm/schedule/warehouse/relation/category/$yesterday --mcSiteCategoryPath /user/hive/warehouse/mc_site_category --extractPreprocess /user/hive/warehouse/test_extract_preprocess --engineMatchRule /user/pms/recsys/algorithm/schedule/warehouse/mix/artificial/product/$yesterday --artificialMatchRule /user/pms/recsys/algorithm/schedule/warehouse/ruleengine/artificial/product/$yesterday --category /user/hive/warehouse/category --keywordCategoryTopN 3 --termCategory /user/hive/pms/temp_term_category --extractGrouponInfo /user/hive/pms/extract_groupon_info --extractProductSerial /user/hive/pms/product_serial_id --job1OutputPath /user/pms/workspace/ouyangyewei/testUsertrack/job1Output --job2OutputPath /user/hive/pms/test_rcmd_valid_path/ds=$date 

[Hive]MapReduce将数据写入Hive分区表

标签:

原文地址:http://blog.csdn.net/yeweiouyang/article/details/44834073

(1)
(0)
   
举报
评论 一句话评论(0
登录后才能评论!
© 2014 mamicode.com 版权所有  联系我们:gaon5@hotmail.com
迷上了代码!