hive初探2_数据模型

时间：2015-10-03 18:11:50 阅读：202 评论：0 收藏：0 [点我收藏+]

标签：

1、hive数据类型：

基本数据类型：tinyint、smallint、int、bigint、float、double、boolean、string

复合数据类型：

array:一段有序字段，字段的类型必须相同

map:一组无序的健/值对，健的类型必须是原子类型

struct:一组命名的字段，类型可以不同

复杂数据类型用法如下：

Create table complex(

col1 ARRAY<INT>,

Col2 MAP<STRING,INT>,

Col3 STRUCT<a:STRING,b :INT,c:DOUBLE>

);

Select col1[0],col2[‘b’],col3.c from complex;

2、hive数据模型：

数据模型主要有：database、table、partition、bucket

(1)database：相当于关系型数据库中的命名空间，作用是将数据库应用隔离到不同的数据库模式中，hive提供了create database dbname、use dbname以及drop database dbname这样的语句

(2)table:表由存储的数据及描述表的一些元数据组成，存储的数据存储在分布式文件系统中，元数据存储在关系型数据库中，刚创建表还没有加载数据时，在hdfs上只是创建了一个目录，如table a，在hdfs上的路径为${hive仓库路径}/a,加载完数据后会将数据文件拷贝到该hdfs目录下，文件名与加载的数据文件名相同，如${hive仓库路径}/a/empinfo.txt

hive的表分两种：

　　1>托管表：这种表的数据文件会加载到hive设置的数据仓库目录下

　　2>外部表：这种表存放在hive数据仓库目录以外的其它hdfs目录中，也可以放在hive的数据仓库中

创建托管表：

hive>Create table tuoguan_tbl (flied string);

hive>load data local inpath ‘home/hadoop/test.txt’ into table tuoguan_tbl;

创建外部表：

hive> create external table external_tb1(field string)
> location ‘/user/username/input/tb_wordcount‘;//如果不加location数据会加载到hive的数据仓库中

hive>load data local inpath ‘test.txt’ into table external_tbl;

托管表、外部表的区别除了加载的数据存放的目录不一样外，还有一个是使用drop命令的区别，托管表在drop时存储的数据还有元数据都会删除，而外部表只会删除元数据，不会删除存储数据。

查看表的具体信息使用：

desc tableName 或者 desc formatted tableName

(3)partition:分区

hive分区是根据某列的值进行粗略的划分，每个分区对应hdfs上的一个目录，例如：

有几个目录/user/username/input/2015/01、/user/username/input/2015/02两个目录，建表想以年、月份分区，可建表：

Create table logs(id int,line string)

Partitioned by (year string,month string);

然后查询select * from logs where month=02然后查询的时候就会只扫描/user/username/input/2015/02这个目录

(4)桶：要使用桶，首先要打开hive对桶的控制：

hive>set hive.enforce.bucketing = true

桶是按照指定值进行hash,每个桶就是表目录里的一个文件

hive初探2_数据模型

标签：

原文地址：http://www.cnblogs.com/zhli/p/4853626.html

踩

(0)

评论一句话评论（0）

分享档案

更多>

2021年07月29日 (22)
2021年07月28日 (40)
2021年07月27日 (32)
2021年07月26日 (79)
2021年07月23日 (29)
2021年07月22日 (30)
2021年07月21日 (42)
2021年07月20日 (16)
2021年07月19日 (90)
2021年07月16日 (35)

周排行