hive 学习系列三（表格的创建create-table）

时间：2018-07-24 19:59:29 阅读：194 评论：0 收藏：0 [点我收藏+]

标签：sorted desc dir config storage 文件格式 not efi isp

表格创建：

语法

第一种建表的形式：

说明： 
temporary 临时表，在当前回话内，这张表有效，当回话结束，可以理解为程序结束，则程序终止。
external 外部表， hdfs 上的表的文件，并非存储在默认的路径上的时候， 
    EXTERNAL 表格和正常表格删除区别，external 只删除metastore
    可以称为外部表，便于和其他数据库和程序交互，比如impala 等。
如果不加 IF NOT EXISTS 的时候，如果表存在，会报错，可以加上IF NOT EXISTS 加以避免。
注意表名不区分大小写
例子：
create temporary table my.table1;
create external table my.table2;
create tabel if not exists my.table3;
-- (Note: TEMPORARY available in Hive 0.14.0 and later)
CREATE [TEMPORARY] [EXTERNAL] TABLE [IF NOT EXISTS] [db_name.]table_name    
   --定义列， 比如 id  Int comment ‘索引‘, name string comment ‘名字‘
  [(col_name data_type [COMMENT col_comment], ... [constraint_specification])]   
  [COMMENT table_comment]  -- comment 表示表的注释    
  --分区，括号内的定义类似列的定义，分区可以根据默写字段比如日期，城市，进行分区，可以加快某些条件下的查询
  --部分列的集合，根据分区列的进行粗粒度的划分，一个分区，代表着一个目录
  [PARTITIONED BY (col_name data_type [COMMENT col_comment], ...)]  
  --分桶，在分区的基础上，可以进行分桶，分桶的原理是，根据某几列进行计算hash 值，
  --然后hash 值对分成的桶的个数取余操作，决定放在哪个桶里面
  --在数据量足够大的情况下，分桶比分区，更高的查询效率 
  --分桶，还可以使抽样更加高效
  [CLUSTERED BY (col_name, col_name, ...) 
            [SORTED BY (col_name [ASC|DESC], ...)] INTO num_buckets BUCKETS]  ---- 分桶
  ---大致上Skewed，对数据倾斜处理有很大帮助，没用过 
  [SKEWED BY (col_name, col_name, ...)                  -- (Note: Available in Hive 0.10.0 and later)]
     ON ((col_value, col_value, ...), (col_value, col_value, ...), ...)
     [STORED AS DIRECTORIES]
  [
   [ROW FORMAT row_format] 
   [STORED AS file_format]
     | STORED BY ‘storage.handler.class.name‘ [WITH SERDEPROPERTIES (...)]  -- (Note: Available in Hive 0.6.0 and later)
  ]   -- 表示文件的存储格式， 其中store by 指的是自定义文件格式，用得不多，笔者没有用过。
  [LOCATION hdfs_path]
  [TBLPROPERTIES (property_name=property_value, ...)]    --  表示表格的附加属性和表述。 
                                                         -- (Note: Available in Hive 0.6.0 and later)
  [AS select_statement];  
   -- 建立表格的时候同时从其他表格select 数据进行填充表格。
   -- (Note: as  select_statement Available in Hive 0.5.0 and later; not supported for external tables)
 
CREATE [TEMPORARY] [EXTERNAL] TABLE [IF NOT EXISTS] [db_name.]table_name
  LIKE existing_table_or_view_name
  [LOCATION hdfs_path];
 
 说明：
 数据类型
data_type
  : primitive_type
  | array_type
  | map_type
  | struct_type
  | union_type  -- (Note: Available in Hive 0.7.0 and later)
 
基本数据类型
primitive_type
  : TINYINT
  | SMALLINT
  | INT
  | BIGINT
  | BOOLEAN
  | FLOAT
  | DOUBLE
  | DOUBLE PRECISION -- (Note: Available in Hive 2.2.0 and later)
  | STRING
  | BINARY      -- (Note: Available in Hive 0.8.0 and later)
  | TIMESTAMP   -- (Note: Available in Hive 0.8.0 and later)
  | DECIMAL     -- (Note: Available in Hive 0.11.0 and later)
  | DECIMAL(precision, scale)  -- (Note: Available in Hive 0.13.0 and later)
  | DATE        -- (Note: Available in Hive 0.12.0 and later)
  | VARCHAR     -- (Note: Available in Hive 0.12.0 and later)
  | CHAR        -- (Note: Available in Hive 0.13.0 and later)
 
 复杂数据类型
array_type
  : ARRAY < data_type >
 
map_type
  : MAP < primitive_type, data_type >
 
struct_type
  : STRUCT < col_name : data_type [COMMENT col_comment], ...>
 
union_type
   : UNIONTYPE < data_type, data_type, ... >  -- (Note: Available in Hive 0.7.0 and later)
 
## 在hdfs 上的文件存储格式
row_format
  : DELIMITED [FIELDS TERMINATED BY char [ESCAPED BY char]] [COLLECTION ITEMS TERMINATED BY char]
        [MAP KEYS TERMINATED BY char] [LINES TERMINATED BY char]
        [NULL DEFINED AS char]   -- (Note: Available in Hive 0.13 and later)
  | SERDE serde_name [WITH SERDEPROPERTIES (property_name=property_value, property_name=property_value, ...)]
 
file_format:
  : SEQUENCEFILE
  | TEXTFILE    -- (Default, depending on hive.default.fileformat configuration)
  | RCFILE      -- (Note: Available in Hive 0.6.0 and later)
  | ORC         -- (Note: Available in Hive 0.11.0 and later)
  | PARQUET     -- (Note: Available in Hive 0.13.0 and later)
  | AVRO        -- (Note: Available in Hive 0.14.0 and later)
  | INPUTFORMAT input_format_classname OUTPUTFORMAT output_format_classname
 
constraint_specification:
  : [, PRIMARY KEY (col_name, ...) DISABLE NOVALIDATE ]
    [, CONSTRAINT constraint_name FOREIGN KEY (col_name, ...) REFERENCES table_name(col_name, ...) DISABLE NOVALIDATE

说明

上述的建表语法，有些语法笔者不是很懂，希望各位不吝赐教。

常见例子：

例子一

create  table my.tabelDemo(
    id      int,
    name    string,
    hobby   array<string>,
   add     map<String,string>,
)
row format delimited
fields terminated by ‘,‘
collection items terminated by ‘-‘
map keys terminated by ‘:‘
store as textfile;

每一列之间，使用逗号分隔，
array 内部的string 使用-分隔。
map 的key 和value， 使用冒号分隔 ：

例子二

-- 文件存储形式是parquet
CREATE EXTERNAL TABLE IF NOT EXISTS default.person_table( 
    ftpurl        string, 
    ipcid         string, 
    feature       array<float>, 
    eyeglasses    int, 
    gender        int, 
    haircolor     int, 
    hairstyle     int, 
    hat           int, 
    huzi          int, 
    tie           int, 
    timeslot      int, 
    exacttime     Timestamp, 
    searchtype    string, 
    sharpness     int
) 
partitioned by (date string) 
STORED AS PARQUET 
LOCATION ‘/user/hive/warehouse/person_table‘;

struct 使用

create table student_test(id INT, info struct<name:STRING, age:INT>)  
    ROW FORMAT DELIMITED FIELDS TERMINATED BY ‘,‘                         
    COLLECTION ITEMS TERMINATED BY ‘:‘;         
    
hdfs 中的文件数据格式大致是：即（struct 里面对应的分隔符是 collection items terminated by 指定的分隔符）
1,zhou:30  
2,yan:30  
3,chen:20  
4,li:80

以下是truncate 用来进行表格的清空

一个有用的数据清空工具

TRUNCATE TABLE table_name [PARTITION partition_spec];
 
partition_spec:
  : (partition_column = partition_col_value, partition_column = partition_col_value, ...)

删除表格

DROP TABLE [IF EXISTS] table_name [PURGE]; 
-- purge，如果配置了垃圾回收，而drop table 时 加上了purge，则其会被彻底删除，在垃圾箱中也找不回来。

修改表

重命名表

ALTER TABLE table_name RENAME TO new_table_name;

改变表格属性

ALTER TABLE table_name SET TBLPROPERTIES table_properties;
 
table_properties:
  : (property_name = property_value, property_name = property_value, ... )

改变表格评论

ALTER TABLE table_name SET TBLPROPERTIES (‘comment‘ = new_comment);

对表格进行分桶

ALTER TABLE table_name CLUSTERED BY (col_name, col_name, ...) [SORTED BY (col_name, ...)]
  INTO num_buckets BUCKETS;

添加分区

ALTER TABLE table_name ADD [IF NOT EXISTS] PARTITION partition_spec [LOCATION ‘location‘]
    [, PARTITION partition_spec [LOCATION ‘location‘], ...];
partition_spec:
  : (partition_column = partition_col_value, partition_column = partition_col_value, ...)

重命名分区

ALTER TABLE table_name PARTITION partition_spec RENAME TO PARTITION partition_spec;

删除分区

ALTER TABLE table_name DROP [IF EXISTS] PARTITION partition_spec[, PARTITION partition_spec, ...]
  [IGNORE PROTECTION] [PURGE];            
  -- (Note: PURGE available in Hive 1.2.0 and later, IGNORE PROTECTION not available 2.0.0 and later)

视图创建

CREATE VIEW [IF NOT EXISTS] [db_name.]view_name [(column_name [COMMENT column_comment], ...) ]
  [COMMENT view_comment]
  [TBLPROPERTIES (property_name = property_value, ...)]
  AS SELECT ...;

原文参考：
https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL

hive 学习系列三（表格的创建create-table）

标签：sorted desc dir config storage 文件格式 not efi isp

原文地址：https://www.cnblogs.com/unnunique/p/9362094.html

踩

(0)

评论一句话评论（0）

分享档案

更多>

2021年07月29日 (22)
2021年07月28日 (40)
2021年07月27日 (32)
2021年07月26日 (79)
2021年07月23日 (29)
2021年07月22日 (30)
2021年07月21日 (42)
2021年07月20日 (16)
2021年07月19日 (90)
2021年07月16日 (35)

周排行