码迷,mamicode.com
首页 > 其他好文 > 详细

Hive中将查询结果导出到指定分隔符的文件中

时间:2015-06-27 19:53:56      阅读:105      评论:0      收藏:0      [点我收藏+]

标签:

在Hive0.11.0版本中新引进了一个新的特性,当用户将Hive查询结果输出到文件中时,用户可以指定列的分割符,而在之前的版本是不能指定列之间的分隔符。

在Hive0.11.0之前版本如下使用,无法指定分隔符,默认为\x01:

hive (hive)> insertoverwrite local directory ‘/home/hadoop/export_hive‘ select * from a;

Query ID =hadoop_20150627174342_64852f3a-56ed-48d5-a545-fc28f109be74

Total jobs = 1

Launching Job 1 out of 1

Number of reduce tasks is set to 0since there‘s no reduce operator

Starting Job = job_1435392961740_0025,Tracking URL = http://gpmaster:8088/proxy/application_1435392961740_0025/

Kill Command =/home/hadoop/hadoop-2.6.0/bin/hadoop job -kill job_1435392961740_0025

Hadoop job information for Stage-1:number of mappers: 1; number of reducers: 0

2015-06-27 17:43:54,395 Stage-1 map =0%,  reduce = 0%

2015-06-27 17:44:07,615 Stage-1 map =100%,  reduce = 0%, Cumulative CPU 4.82sec

MapReduce Total cumulative CPU time: 4seconds 820 msec

Ended Job = job_1435392961740_0025

Copying data to local directory /home/hadoop/export_hive

Copying data to local directory/home/hadoop/export_hive

MapReduce Jobs Launched:

Stage-Stage-1: Map: 1   Cumulative CPU: 4.82 sec   HDFS Read: 2416833 HDFS Write: 1188743SUCCESS

Total MapReduce CPU Time Spent: 4seconds 820 msec

OK

a.key     a.value

Time taken: 26.475 seconds

(目录不存在时,会自动创建)

 

查看生成的文件内容:

[hadoop@gpmaster export_hive]$ head-n 10 /home/hadoop/export_hive/000000_0 | cat -A

2610^AaAAnz$

32196^AaAAoWnz$

78606^AaAAyXFz$

3804^AaAAz$

30102^AaABEWez$

21744^AaABukz$

39666^AaABz$

1632^AaABz$

82464^AaABz$

88320^AaACCaz$

我使用cat -A参数,将文件中每行的结尾$符号和分隔符^A(即是\x01)打印了出来。

 

接下来,我们使用Hive0.11.0版本新引进的新特性,指定输出结果列之间的分隔符:

hive (hive)> insertoverwrite local directory ‘/home/hadoop/export_hive‘ row format delimitedfields terminated by ‘*‘ select * from a;

Query ID =hadoop_20150627180045_fced1513-8f1b-44a8-8e88-3cd678552aa5

Total jobs = 1

Launching Job 1 out of 1

Number of reduce tasks is set to 0since there‘s no reduce operator

Starting Job = job_1435392961740_0028,Tracking URL = http://gpmaster:8088/proxy/application_1435392961740_0028/

Kill Command =/home/hadoop/hadoop-2.6.0/bin/hadoop job -kill job_1435392961740_0028

Hadoop job information for Stage-1:number of mappers: 1; number of reducers: 0

2015-06-27 18:00:57,354 Stage-1 map =0%,  reduce = 0%

2015-06-27 18:01:10,567 Stage-1 map =100%,  reduce = 0%, Cumulative CPU 4.68sec

MapReduce Total cumulative CPU time: 4seconds 680 msec

Ended Job = job_1435392961740_0028

Copying data to local directory/home/hadoop/export_hive

Copying data to local directory/home/hadoop/export_hive

MapReduce Jobs Launched:

Stage-Stage-1: Map: 1   Cumulative CPU: 4.68 sec   HDFS Read: 2417042 HDFS Write: 1188743SUCCESS

Total MapReduce CPU Time Spent: 4seconds 680 msec

OK

a.key     a.value

Time taken: 26.607 seconds

 

查看指定分隔符为*后,导出的数据如下:

[hadoop@gpmaster export_hive]$ head -n10 /home/hadoop/export_hive/000000_0

2610*aAAnz

32196*aAAoWnz

78606*aAAyXFz

3804*aAAz

30102*aABEWez

21744*aABukz

39666*aABz

1632*aABz

82464*aABz

88320*aACCaz

可以看到列的分隔符的确是我们指定的*号分隔符。

 

如果是复合类型,比如struct,map类型等也可以指定对应的分隔符:

以下我们做个实例操作实际操作以下:

(1)  创建复合类型的表

hive (hive)> create table userinfo(id int,name string,job_listarray<string>,perf map<int,string>,info struct<address:STRING,size:INT>);

 

(2)  构造数据(使用默认分隔符构造)

1^A小明^AIT工程师^B教师^A10086^C正常^B10010^C不正常^A北京市^B130

2^A小花^A保姆^B护士^A10086^C正常^B10010^C正常^A南京市^B130

注释:

\001使用^A代替,\002使用^B,\003使用^C代替

造数据在使用vi编辑器里面,用ctrl+v然后再ctrl+a可以输入这个控制符\001。按顺序,\002的输入方式为ctrl+v,ctrl+b,依次类推。

(3)  导入数据

hive (hive)> load data local inpath‘/home/hadoop/hivetestdata/userinfo.txt‘ overwrite into table userinfo;

(4)  查询数据

hive (hive)> select * from userinfo;

OK

userinfo.id   userinfo.name   userinfo.job_list userinfo.perf     userinfo.info

1    小明     ["IT工程师","教师"] {10086:"正常",10010:"不正常"}   {"address":"北京市","size":130}

2    小花     ["保姆","护士"]  {10086:"正常",10010:"正常"} {"address":"南京市","size":130}

Time taken: 0.088 seconds, Fetched: 2 row(s)

(5)  导出数据

我们指定的分隔符为:

列分隔符为 \t

map keys分隔符为:(冒号)

collection items分隔符为:,(逗号)

 

 

 

执行导出命令:

hive (hive)> insert overwrite localdirectory ‘/home/hadoop/export_hive‘

           > row formatdelimited fields terminated by ‘\t‘

           > collectionitems terminated by ‘,‘

           > map keysterminated by ‘:‘

           >select * from userinfo;

查看导出的数据为:

[hadoop@gpmaster export_hive]$ cat 000000_0

1    小明     IT工程师,教师         10086:正常,10010:不正常     北京市,130

2    小花     保姆,护士    10086:正常,10010:正常   南京市,130

版权声明:本文为博主原创文章,未经博主允许不得转载。

Hive中将查询结果导出到指定分隔符的文件中

标签:

原文地址:http://blog.csdn.net/jiangshouzhuang/article/details/46663281

(0)
(0)
   
举报
评论 一句话评论(0
登录后才能评论!
© 2014 mamicode.com 版权所有  联系我们:gaon5@hotmail.com
迷上了代码!