Hive编程指南_学习笔记01

时间：2017-06-21 23:00:22 阅读：244 评论：0 收藏：0 [点我收藏+]

标签：hash where field ref 影响 array 添加 into 不同

第四章： HQl的数据定义
1：创建数据库
create database financials;
create database if not exists financials;

2: 查看数据库
show databases;
模糊查询数据库
show databases like ‘h.*‘ ;

3：创建数据库改动数据库的默认位置
create database financials localtion ‘/my/preferred/directory‘

4：添加数据库的描写叙述信息
create database financials comment ‘holds all financials tables‘
5: 显示数据库的描写叙述的信息
describe database financials;
6：添加一些和相关属性的键-值对属性信息
create database financials
with dbproperties (‘create‘= ‘Mark Moneybags‘, ‘data‘=‘2012-12-12‘);
describe database extended financials;

7:没有命令提示让用户查看当前所在的是那个数据库。能够反复使用use
use financials。
能够通过设置一个属性值来在提示符里面显示当前所在的数据库
set hive.cli.print.current.db = true;
set hive.cli.print.current.db= false;

8:删除数据库
drop database if exists financials;
Hive是不同意删除一个包括表的数据库。
当时假设加上keyword： cascade。就能够了，hive自己主动删除数据库中的表
drop database if exists financials cascade;

9：改动数据库，设置dbproperties键值对属性值
alert database financials set dbproperties(‘edited-by‘=‘joe dba‘);

10:创建表：
create table if not exists employees (
name string comment ‘employee name‘,
salary float comment ‘employee salary ‘,
subordinates array<string> comment ‘employee name of subordinates ‘ ,
deductions Map<string,FLOAT>,
address struct<street:string,city:string,state:String,zip：int>
)
comment ‘ description of the table ‘
tblproperties (‘creater‘= ‘me‘, ‘created_at‘=‘2012-12-12‘);
location ‘/user/hive/warehouse/mydb.db/employees‘

-- tblproperties 的主要作用是：按键-值对的格式为表添加额外的文档说明

11: 列举某个表的tblproperties 属性信息
show tblproperties employees;

12：拷贝表
create table if not exists mydb.employees2 like mydb.employees2

13：选择数据库
use mydb
显示表
show tables;
show tables IN mydb;
14：查看这个表的具体结果信息
describe extended mydb.employees
使用formatted keyword取代 extended
describe formatted mydb.employees

15：管理表：内部表：删除表时，会删除这个表的数据
创建一个外部表：其能够读取全部位于/data/stocks文件夹下的以逗号切割的数据
create external table if not exists stocks(
exchange string,
symbol string,
ymd String,
price_open float,
price_hight float,
price_low float,
price_close float,
volume int,price_adj_close float)
row format delimited fields terminated by ‘,‘
location ‘/data/stocks‘

16：查看表是否是管理表还是外部表
describe extended tablename
输出信息：
tableType.managed_table--管理表
tableType.external_table--外部表

-- 复制表但不会复制数据
create table if not exists mydb.employees3(新表)
like mydb.employees2(原表) location ‘/data/stocks‘

17：创建分区表
create table employees (
name string,
salary float,
subordinates array<string>,
deductions Map<string,FLOAT>,
address struct<street:string,city:string,state:String,zip：int>
)
partitioned by (country String,state string);

分区自段：
country String,state string 和普通字段一样。相当于索引字段。
依据分区字段查询，提交效率。提高查询性能

18： set hive.mapred.mode=strict;
假设对分区表进行查询而where子句没有加分区过滤的话，
将会禁止提交这个任务。
能够设置为：nostrict

19：查看表中存在的全部分区
show partitions employees;

20：查看是否存储某个特定分区键的分区的话
show partitions employees partition(country=‘US‘);
describe extended employees 命令也会显示分区键

管理大型生产数据集最常见的情况：使用外部分区表
21：在管理表中用户能够通过加载数据的方式创建分区：
load data local inpath ‘/home/hive/California-employees‘
INTO table employees
partition(country=‘US‘,state=‘CA‘);

hive 将会创建这个分区相应的文件夹..../employees/country=US/state=CA

22:创建外部分区表

create table if not exists log_messages (
hms int,
severity string,
server string,
process_id int,
message string

)
partitioned by (year int,month int,day int)
row format delimited fields terminated by ‘\t‘

1:order by 会对输入做全局排序

2: sort能够控制每一个reduce产生的文件都是排序。再对多个排序的好的文件做二次归并排序。

sort by 特点例如以下：
1) . sort by 基本受hive.mapred.mode是否为strict、nonstrict的影响，但若有分区须要指定分区。
2). sort by 的数据在同一个reduce中数据是按指定字段排序。
3). sort by 能够指定运行的reduce个数，如：set mapred.reduce.tasks=5 ,对输出的数据再运行归并排序。即能够得到所有结果。

结果说明：严格模式下，sort by 不指定limit 数，能够正常运行。

sort by 受hive.mapred.mode=sctrict 的影响较小。

3:distribute by
distribute by 是控制在map端怎样拆分给reduce端。

依据distribute by 后面的列及reduce个数进行数据分发，默认採用hash算法。distribute能够使用length方法会依据string类型的长度划分到不同的reduce中。终于输出到不同的文件里。 length 是内建函数，也能够指定其它的函数或这使用自己定义函数。

4: cluster by

cluster by 除了distribute by 的功能外，还会对该字段进行排序，所以cluster by = distribute by +sort by

Hive编程指南_学习笔记01

标签：hash where field ref 影响 array 添加 into 不同

原文地址：http://www.cnblogs.com/wzjhoutai/p/7061856.html

踩

(0)

评论一句话评论（0）

分享档案

更多>

2021年07月29日 (22)
2021年07月28日 (40)
2021年07月27日 (32)
2021年07月26日 (79)
2021年07月23日 (29)
2021年07月22日 (30)
2021年07月21日 (42)
2021年07月20日 (16)
2021年07月19日 (90)
2021年07月16日 (35)

周排行