标签:
通过查询CREATETABLE命令帮助如下所示:
Command: CREATE TABLE Description: define a new table Syntax: CREATE [[GLOBAL | LOCAL] {TEMPORARY | TEMP}] TABLE table_name ( -->指定表类型:全局|本地临时 [ { column_name data_type [ DEFAULT default_expr ] [column_constraint [ ... ] [ ENCODING ( storage_directive [,...] ) ] -->指定表编码 ] | table_constraint -->指定表约束 | LIKE other_table [{INCLUDING | EXCLUDING} {DEFAULTS | CONSTRAINTS}] ...} [, ... ] ] [column_reference_storage_directive [, . ] ) [ INHERITS ( parent_table [, ... ] ) ] -->指定表继承关系 [ WITH ( storage_parameter=value [, ... ] ) -->指定存储空间 [ ON COMMIT {PRESERVE ROWS | DELETE ROWS | DROP} ] [ TABLESPACE tablespace ] -->指定表空间 [ DISTRIBUTED BY (column, [ ... ] ) | DISTRIBUTED RANDOMLY ] -->指定分布列 [ PARTITION BY partition_type (column) -->指定分区列 [ SUBPARTITION BY partition_type (column) ] -->指定子分区列 [ SUBPARTITION TEMPLATE ( template_spec ) ] [...] ( partition_spec ) | [ SUBPARTITION BY partition_type (column) ] [...] ( partition_spec [ ( subpartition_spec [(...)] ) ] )
where storage_parameter is: -->指定创建表存在的参数: APPENDONLY={TRUE|FALSE} -->指定是否可以appendonly BLOCKSIZE={8192-2097152} -->指定表块大小 ORIENTATION={COLUMN|ROW} -->指定表旋转方式 COMPRESSTYPE={ZLIB|QUICKLZ|RLE_TYPE|NONE} -->指定表的压缩方式 COMPRESSLEVEL={0-9} -->指定表的压缩级别 FILLFACTOR={10-100} -->指定表的占空因数 OIDS[=TRUE|FALSE] -->指定表的对象标识符
where column_constraint is: -->指定列约束如下: [CONSTRAINT constraint_name] -->约束名称 NOT NULL | NULL -->是否为空 | UNIQUE [USING INDEX TABLESPACE tablespace] -->唯一[使用索引表空间] [WITH ( FILLFACTOR = value )] | PRIMARY KEY [USING INDEX TABLESPACE tablespace] -->主键 [WITH ( FILLFACTOR = value )] | CHECK ( expression ) -->其它表达式约束
and table_constraint is: -->指定表约束如下: [CONSTRAINT constraint_name] -->指定表约束名称 UNIQUE ( column_name [, ... ] ) -->指定唯一的列名等 [USING INDEX TABLESPACE tablespace] -->唯一[使用索引表空间] [WITH ( FILLFACTOR=value )] | PRIMARY KEY ( column_name [, ... ] ) -->主键 [USING INDEX TABLESPACE tablespace] [WITH ( FILLFACTOR=value )] | CHECK ( expression ) -->其它表达式约束
where partition_type is: -->指定分区类型:LIST|RANGE LIST | RANGE
where partition_specification is: -->指定分区说明:包含分区元素 partition_element [, ...]
and partition_element is: -->指定分区元素说明: DEFAULT PARTITION name -->默认分区名称 | [PARTITION name] VALUES (list_value [,...] ) | [PARTITION name] START ([datatype] ‘start_value‘) [INCLUSIVE | EXCLUSIVE] [ END ([datatype] ‘end_value‘) [INCLUSIVE | EXCLUSIVE] ] [ EVERY ([datatype] [number | INTERVAL] ‘interval_value‘) ] | [PARTITION name] END ([datatype] ‘end_value‘) [INCLUSIVE | EXCLUSIVE] [ EVERY ([datatype] [number | INTERVAL] ‘interval_value‘) ] [ WITH ( partition_storage_parameter=value [, ... ] ) ] [column_reference_storage_directive [, ...] ]
[ TABLESPACE tablespace ]
where subpartition_spec or template_spec is:-->指定子分区说明或者模板分区说明 subpartition_element [, ...] and subpartition_element is: DEFAULT SUBPARTITION name | [SUBPARTITION name] VALUES (list_value [,...] ) | [SUBPARTITION name] START ([datatype] ‘start_value‘) [INCLUSIVE | EXCLUSIVE] [ END ([datatype] ‘end_value‘) [INCLUSIVE | EXCLUSIVE] ] [ EVERY ([datatype] [number | INTERVAL] ‘interval_value‘) ] | [SUBPARTITION name] END ([datatype] ‘end_value‘) [INCLUSIVE | EXCLUSIVE] [ EVERY ([datatype] [number | INTERVAL] ‘interval_value‘) ] [ WITH ( partition_storage_parameter=value [, ... ] ) ] [column_reference_storage_directive [, ...] ] [ TABLESPACE tablespace ]
where storage_directive is: -->指定存储策略 COMPRESSTYPE={ZLIB | QUICKLZ | RLE_TYPE | NONE} | COMPRESSLEVEL={0-9} | BLOCKSIZE={8192-2097152}
Where column_reference_storage_directive is: -->指定列参考存储策略
COLUMN column_name ENCODING ( storage_directive [, ... ] ), ... | DEFAULT COLUMN ENCODING ( storage_directive [, ... ] )
|
关于上述命令的说明:
testdw=# \d 使用\d命令查看当前数据库中表约束 List of relations Schema | Name | Type | Owner | Storage --------+-------------+-------+---------+--------- public | tb01 | table | gpadmin | heap public | tb02 | table | gpadmin | heap public | tb1_test_01 | table | gpadmin | heap (3 rows)
testdw=# \d tb01 使用\d + 表名查看当前表结构 Table "public.tb01" Column | Type | Modifiers --------+---------+----------- id | integer | Distributed by: (id) Tablespace: "testspace" |
testdw=# create table tb03(a int ,b text) distributed randomly; CREATE TABLE testdw=#\d 使用\d命令查看当前数据库中表约束 List of relations Schema | Name | Type | Owner | Storage --------+-------------+-------+---------+--------- public | tb01 | table | gpadmin | heap public | tb02 | table | gpadmin | heap public | tb03 | table | gpadmin | heap public | tb1_test_01 | table | gpadmin | heap (4 rows)
testdw=# \d tb03 使用\d + 表名查看当前表结构 Table "public.tb03" Column | Type | Modifiers --------+---------+----------- a | integer | b | text | Distributed randomly
testdw=# create table tb04(a int,b int primary key,c text); NOTICE: CREATE TABLE / PRIMARY KEY will create implicit index "tb04_pkey" for table "tb04" CREATE TABLE testdw=# \d tb04; 使用\d + 表名查看当前表结构 Table "public.tb04" Column | Type | Modifiers --------+---------+----------- a | integer | b | integer | not null c | text | Indexes: "tb04_pkey" PRIMARY KEY, btree (b) Distributed by: (b) |
在CREATE TABLE和ALTER TABLE的时候使用DISTRIBUTEDBY(HASH 分布)或DISTRIBUTED RANDOMLY(随机分布)来决定数据如何分布。有以下三个考虑要点:
在创建或者修改表定义的时候指定;如果没有指定,系统会依次考虑使用主键或第一个字段作为HASH分布的DK;几何类型或自定义类型的列不适合作为GP的DK;如果没有合适类型的列可以保证数据平均分布,则使用随机分布。
testdw=# CREATE TABLE products (name varchar(40), prod_id integer, supplier_id integer) DISTRIBUTED BY (prod_id); CREATE TABLE testdw=# \d products Table "public.products" Column | Type | Modifiers -------------+-----------------------+----------- name | character varying(40) | prod_id | integer | supplier_id | integer | Distributed by: (prod_id)
testdw=# CREATE TABLE random_stuff (things text, doodads text, etc text) DISTRIBUTED RANDOMLY; CREATE TABLE testdw=# \d random_stuff Table "public.random_stuff" Column | Type | Modifiers ---------+------+----------- things | text | doodads | text | etc | text | Distributed randomly
testdw=# \d List of relations Schema | Name | Type | Owner | Storage --------+--------------+-------+---------+--------- public | products | table | gpadmin | heap public | random_stuff | table | gpadmin | heap public | tb01 | table | gpadmin | heap public | tb02 | table | gpadmin | heap public | tb03 | table | gpadmin | heap public | tb04 | table | gpadmin | heap public | tb1_test_01 | table | gpadmin | heap (7 rows) |
GPDB提供了一系列灵活的存储处理模式。选择堆存储(Heap)或只追加(Append-Only)存储;堆存储适合数据经常变化的小表,比如维度表;只追加存储适合仓库中事实大表,通常是批量装载数据并只进行只读查询操作,不支持UPADTE和DELETE操作。
考虑因素:
1) 表数据的更新,数据需要更新,只能选择行存储。
2) 经常做INSERT操作,如果经常有数据被INSERT,考虑选择行存储。
3) 查询设计的列数量,如果在SELECT或WHERE中涉及表的全部或大部分列时,考虑行存储。列存储适用于在WHERE或HAVING中队单列作聚合操作:或在WHERE条件中使用单个列条件且返回少量的行;
4) 使用压缩存储:选择行存储(Row-Orientation)或列存储(Column-Orientation)
5) 表的列数量:行存储对于列多或行尺寸相对小的表更高效;
6) 列存储在只访问宽表的少量列的查询中性能更高;
7) 压缩:列存储表具有压缩优势,创建列存储表,在CREATE TABLE时使用WITH子句指定表的存储模式;
8) 理论上行存储的效率高于列存储,可以使用\timing来记录查询SQL的执行。
表存储方式 |
可用的压缩类型 |
支持的压缩算法 |
行 |
列级和表级 |
ZLIB|QUICKLZ |
列 |
列级和表级 |
RLE_TYPE|ZLIB|QUICKLZ |
注意:QUICKLZ只有一中压缩级别,而ZLIB有1-9可选。
检查AO表的压缩和分布情况,通过GP提供了内置函数用以检查AO表的压缩率和分布情况。
函数 |
返回类型 |
描述 |
get_ao_distribution(name) |
Set of (dbid, tuplecount) rows |
展示AO表的分布情况 |
get_ao_distribution(oid) |
||
get_ao_compression_ratio(name) |
Float8 |
计算AO表的压缩率 |
get_ao_compression_ratio(oid) |
testdw=# CREATE TABLE tb_zlib_01(a int, b text) WITH (appendonly=true, compresstype=zlib, compresslevel=5); NOTICE: Table doesn‘t have ‘DISTRIBUTED BY‘ clause -- Using column named ‘a‘ as the Greenplum Database data distribution key for this table. HINT: The ‘DISTRIBUTED BY‘ clause determines the distribution of data. Make sure column(s) chosen are the optimal data distribution key to minimize skew. CREATE TABLE testdw=# \d tb_zlib_01 Append-Only Table "public.tb_zlib_01" Column | Type | Modifiers --------+---------+----------- a | integer | b | text | Compression Type: zlib Compression Level: 5 Block Size: 32768 Checksum: f Distributed by: (a)
testdw=# insert into tb_zlib_01 values(1,‘abc‘); INSERT 0 1 …… 向数据库中插入亿级别的数据 testdw=# insert into tb_zlib_01 select * from tb_zlib_01; INSERT 0 134217728 testdw=# select get_ao_distribution(‘tb_zlib_01‘); get_ao_distribution --------------------- (0,134217728) (1,134217728) (2 rows)
testdw=# select get_ao_compression_ratio(‘tb_zlip_01‘); ERROR: relation "tb_zlip_01" does not exist ?????
testdw=# select oid from pg_class where relname = ‘tb_zlip_01‘; oid ----- ?????? (0 rows)
testdw=# select get_ao_compression_ratio(‘tb_zlip_01‘); ERROR: relation "tb_zlip_01" does not exist ????? |
使用列级压缩,使用参数格式:
[ ENCODING ( storage_directive [,…] ) ]
存储参数可以应用在单独的列上,还可以作为所有列的默认值:
C1char ENCODING (compresstype=quicklz, blocksize=65536)
COLUMNC1 ENCODING (compresstype=quicklz, blocksize=65536)
DEFAULTCOLUMN ENCODING (compresstype=quicklz)
关于压缩设置的优先级的说明:在越低级别的设置具有越高的优先级:
1) 子分区的列压缩设置将覆盖分区、列和表级的设置;
2) 分区的列压缩设置将覆盖列和表级的设置;
3) 列的压缩设置将覆盖整个表级的设置;
注意:存储设置不可以被继承
4) 存储参数使用示例
例1:创建表tb_t1包含c1(int), c2(text), c3(text) ,列c1使用ZLIB存储并使用系统定义的块尺寸,列c2使用QUICKLZ压缩并使用65536的块尺寸。列c3不使用压缩且使用系统定义的块尺寸
例2:创建表tb_t2包含c1(int), c2(text), c3(text) ,列c1使用ZLIB存储并使用系统定义的块尺寸,列c2使用QUICKLZ压缩并使用65536的块尺寸。列c3使用RLE_TYPE压缩且使用系统定义的块尺寸
例3:创建分区表tb_t3包含c1(int),c2(text), c3(text) ,列c1使用ZLIB存储并使用系统定义的块尺寸,列c2使用QUICKLZ压缩并使用65536的块尺寸。列c3使用RLE_TYPE压缩且使用系统定义的块尺寸。以c3作为范围分区键,并使用ZLIB存储。
例4:创建分区表tb_t4包含c1(int), c2(text), c3(text)和c4(smallint)字段,列c1使用ZLIB压缩存储,使用默认列存储子句指定QUICKLZ压缩存储和65536块尺寸,c3指定压缩方式为RLE_TYPE。C4显式的指定压缩存储模式为none。以c3作为范围分区键,并使用ZLIB存储。
--Example 1 CREATE TABLE TB_T1 (c1 int ENCODING (compresstype=zlib), c2 text ENCODING (compresstype=quicklz, blocksize=65536), c3 text) WITH (appendonly=true, orientation=column);
--Example 2 CREATE TABLE TB_T2 (c1 int ENCODING (compresstype=zlib), c2 text ENCODING (compresstype=quicklz, blocksize=65536), c3 text ENCODING (compresstype=RLE_TYPE)) WITH (appendonly=true, orientation=column);
--Example 3 CREATE TABLE TB_T3 (c1 int ENCODING (compresstype=zlib), c2 text ENCODING (compresstype=quicklz, blocksize=65536), c3 text ENCODING (compresstype=RLE_TYPE)) WITH (appendonly=true, orientation=column) PARTITION BY RANGE(c3) (start (‘2000-01-01‘::DATE) END (‘2013-12-31‘::DATE), COLUMN C3 ENCODING (compresstype=zlib));
--Example 4 CREATE TABLE TB_T4 (c1 int ENCODING (compresstype=zlib), c2 text , c3 text ENCODING (compresstype=RLE_TYPE), c4 smallint ENCODING (compresstype=none), DEFAULT COLUMN ENCODING (compresstype=quicklz, blocksize=65536)) WITH (appendonly=true, orientation=column) PARTITION BY RANGE(c3) (start (‘2000-01-01‘::DATE) END (‘2013-12-31‘::DATE), COLUMN C3 ENCODING (compresstype=zlib)); |
使用ALTER TABLE命令来改变现有表的定义,查看帮助信息如下所示:命令基本和CREATE TABLE命令对应。
Command: ALTER TABLE Description: change the definition of a table Syntax: ALTER TABLE [ONLY] name RENAME [COLUMN] column TO new_column 修改表中的列名称 ALTER TABLE name RENAME TO new_name 修改表的名称 ALTER TABLE name SET SCHEMA new_schema 修改表的新模式 ALTER TABLE [ONLY] name SET DISTRIBUTED BY (column, [ ... ] ) | DISTRIBUTED RANDOMLY | WITH (REORGANIZE=true|false)
ALTER TABLE [ONLY] name action [, ... ]
ALTER TABLE name [ ALTER PARTITION { partition_name | FOR (RANK(number)) 修改表的分区 | FOR (value) } partition_action [...] ] partition_action
where action is one of: ADD [COLUMN] column_name type [ ENCODING ( storage_directive [,...] ) ] [column_constraint [ ... ]] DROP [COLUMN] column [RESTRICT | CASCADE] ALTER [COLUMN] column TYPE type [USING expression] ALTER [COLUMN] column SET DEFAULT expression ALTER [COLUMN] column DROP DEFAULT ALTER [COLUMN] column { SET | DROP } NOT NULL ALTER [COLUMN] column SET STATISTICS integer ADD table_constraint DROP CONSTRAINT constraint_name [RESTRICT | CASCADE] DISABLE TRIGGER [trigger_name | ALL | USER] ENABLE TRIGGER [trigger_name | ALL | USER] CLUSTER ON index_name SET WITHOUT CLUSTER SET WITHOUT OIDS SET (FILLFACTOR = value) RESET (FILLFACTOR) INHERIT parent_table NO INHERIT parent_table OWNER TO new_owner SET TABLESPACE new_tablespace
where partition_action is one of: ALTER DEFAULT PARTITION DROP DEFAULT PARTITION [IF EXISTS] DROP PARTITION [IF EXISTS] { partition_name | FOR (RANK(number)) | FOR (value) } [CASCADE] TRUNCATE DEFAULT PARTITION TRUNCATE PARTITION { partition_name | FOR (RANK(number)) | FOR (value) } RENAME DEFAULT PARTITION TO new_partition_name RENAME PARTITION { partition_name | FOR (RANK(number)) | FOR (value) } TO new_partition_name ADD DEFAULT PARTITION name [ ( subpartition_spec ) ] ADD PARTITION [name] partition_element [ ( subpartition_spec ) ] EXCHANGE PARTITION { partition_name | FOR (RANK(number)) | FOR (value) } WITH TABLE table_name [ WITH | WITHOUT VALIDATION ] EXCHANGE DEFAULT PARTITION WITH TABLE table_name [ WITH | WITHOUT VALIDATION ] SET SUBPARTITION TEMPLATE (subpartition_spec) SPLIT DEFAULT PARTITION { AT (list_value) | START([datatype] range_value) [INCLUSIVE | EXCLUSIVE] END([datatype] range_value) [INCLUSIVE | EXCLUSIVE] } [ INTO ( PARTITION new_partition_name, PARTITION default_partition_name ) ] SPLIT PARTITION { partition_name | FOR (RANK(number)) | FOR (value) } AT (value) [ INTO (PARTITION partition_name, PARTITION partition_name)]
where partition_element is: VALUES (list_value [,...] ) | START ([datatype] ‘start_value‘) [INCLUSIVE | EXCLUSIVE] [ END ([datatype] ‘end_value‘) [INCLUSIVE | EXCLUSIVE] ] | END ([datatype] ‘end_value‘) [INCLUSIVE | EXCLUSIVE] [ WITH ( partition_storage_parameter=value [, ... ] ) ] [ TABLESPACE tablespace ]
where subpartition_spec is: subpartition_element [, ...] and subpartition_element is: DEFAULT SUBPARTITION subpartition_name | [SUBPARTITION subpartition_name] VALUES (list_value [,...] ) | [SUBPARTITION subpartition_name] START ([datatype] ‘start_value‘) [INCLUSIVE | EXCLUSIVE] [ END ([datatype] ‘end_value‘) [INCLUSIVE | EXCLUSIVE] ] [ EVERY ( [number | datatype] ‘interval_value‘) ] | [SUBPARTITION subpartition_name] END ([datatype] ‘end_value‘) [INCLUSIVE | EXCLUSIVE] [ EVERY ( [number | datatype] ‘interval_value‘) ] [ WITH ( partition_storage_parameter=value [, ... ] ) ] [ TABLESPACE tablespace ]
Where column_reference_storage_directive is:
COLUMN column_name ENCODING ( storage_directive [, ... ] ), ... | DEFAULT COLUMN ENCODING ( storage_directive [, ... ] ) |
关于上述命令的说明:
testdw=# \d tb04 Table "public.tb04" Column | Type | Modifiers --------+---------+----------- a | integer | b | integer | not null c | text | Indexes: "tb04_pkey" PRIMARY KEY, btree (b) Distributed by: (b)
testdw=# ALTER TABLE tb01 ALTER COLUMN a SET NOT NULL; ERROR: column "a" of relation "tb01" does not exist testdw=# ALTER TABLE tb04 ALTER COLUMN a SET NOT NULL; ALTER TABLE testdw=# \d tb04 Table "public.tb04" Column | Type | Modifiers --------+---------+----------- a | integer | not null b | integer | not null c | text | Indexes: "tb04_pkey" PRIMARY KEY, btree (b) Distributed by: (b)
testdw=# ALTER TABLE tb04 SET DISTRIBUTED BY (a); ALTER TABLE testdw=# \d tb04 Table "public.tb04" Column | Type | Modifiers --------+---------+----------- a | integer | not null b | integer | not null c | text | Indexes: "tb04_pkey" PRIMARY KEY, btree (b) Distributed by: (a)
testdw=# ALTER TABLE tb01 SET DISTRIBUTED RANDOMLY; ALTER TABLE testdw=# \d tb01; Table "public.tb01" Column | Type | Modifiers --------+---------+----------- id | integer | Distributed randomly Tablespace: "testspace"
testdw=# ALTER TABLE tb01 SET WITH (REORGANIZE=TRUE); ALTER TABLE testdw=# \d tb01; Table "public.tb01" Column | Type | Modifiers --------+---------+----------- id | integer | Distributed randomly Tablespace: "testspace"
testdw=# CREATE TABLE tb_cp_01 (a int, b int, c int, d int) testdw-# WITH (APPENDONLY = TRUE, ORIENTATION=COLUMN) testdw-# PARTITION BY range(b) testdw-# SUBPARTITION BY list (c) testdw-# SUBPARTITION template( testdw(# SUBPARTITION sp1 values(1, 2, 3, 4, 5), testdw(# COLUMN a ENCODING(COMPRESSTYPE=ZLIB), testdw(# COLUMN b ENCODING(COMPRESSTYPE=QUICKLZ), testdw(# COLUMN c ENCODING(COMPRESSTYPE=ZLIB), testdw(# COLUMN d ENCODING(COMPRESSTYPE=ZLIB)) testdw-# (PARTITION p1 START(1) END(10), testdw(# PARTITION p2 START(10) END(20)); NOTICE: Table doesn‘t have ‘DISTRIBUTED BY‘ clause -- Using column named ‘a‘ as the Greenplum Database data distribution key for this table. HINT: The ‘DISTRIBUTED BY‘ clause determines the distribution of data. Make sure column(s) chosen are the optimal data distribution key to minimize skew. NOTICE: CREATE TABLE will create partition "tb_cp_01_1_prt_p1" for table "tb_cp_01" NOTICE: CREATE TABLE will create partition "tb_cp_01_1_prt_p2" for table "tb_cp_01" NOTICE: CREATE TABLE will create partition "tb_cp_01_1_prt_p1_2_prt_sp1" for table "tb_cp_01_1_prt_p1" NOTICE: CREATE TABLE will create partition "tb_cp_01_1_prt_p2_2_prt_sp1" for table "tb_cp_01_1_prt_p2" CREATE TABLE testdw=# \d tb_cp_01; Append-Only Columnar Table "public.tb_cp_01" Column | Type | Modifiers --------+---------+----------- a | integer | b | integer | c | integer | d | integer | Checksum: f Number of child tables: 2 (Use \d+ to list them.) Distributed by: (a)
testdw=# ALTER TABLE tb_cp_01 ADD PARTITION p3 START(20) END(30); NOTICE: CREATE TABLE will create partition "tb_cp_01_1_prt_p3" for table "tb_cp_01" NOTICE: CREATE TABLE will create partition "tb_cp_01_1_prt_p3_2_prt_sp1" for table "tb_cp_01_1_prt_p3" ALTER TABLE testdw=# \d tb_cp_01; Append-Only Columnar Table "public.tb_cp_01" Column | Type | Modifiers --------+---------+----------- a | integer | b | integer | c | integer | d | integer | Checksum: f Number of child tables: 3 (Use \d+ to list them.) Distributed by: (a) |
使用DROP TABLE命令来删除表,查看命令帮助如下所示:
Command: DROP TABLE Description: remove a table Syntax: DROP TABLE [ IF EXISTS ] name [, ...] [ CASCADE | RESTRICT ] |
删除表中数据可以使用DELETE命令或者TRUNCATE命令,一起删除与表相关的视图时必须使用CASCADE命令。
Command: DELETE Description: delete rows of a table Syntax: DELETE FROM [ ONLY ] table [ [ AS ] alias ] [ USING usinglist ] [ WHERE condition ] [ RETURNING * | output_expression [ AS output_name ] [, ...] ] |
Command: TRUNCATE Description: empty a table or set of tables Syntax: TRUNCATE [ TABLE ] name [, ...] [ CASCADE | RESTRICT ] |
1) 定义日期范围分区表,为每个分区表单独制定名称;
CREATE TABLE tb_cp_01 (id int, date date, amt decimal(10,2)) DISTRIBUTED BY (id) PARTITION BY RANGE (date) ( PARTITION Jan13 START (date ‘2013-01-01‘) INCLUSIVE , PARTITION Feb13 START (date ‘2013-02-01‘) INCLUSIVE , PARTITION Mar13 START (date ‘2013-03-01‘) INCLUSIVE , PARTITION Apr13 START (date ‘2013-04-01‘) INCLUSIVE , PARTITION May13 START (date ‘2013-05-01‘) INCLUSIVE , PARTITION Jun13 START (date ‘2013-06-01‘) INCLUSIVE , PARTITION Jul13 START (date ‘2013-07-01‘) INCLUSIVE , PARTITION Aug13 START (date ‘2013-08-01‘) INCLUSIVE , PARTITION Sep13 START (date ‘2013-09-01‘) INCLUSIVE , PARTITION Oct13 START (date ‘2013-10-01‘) INCLUSIVE , PARTITION Nov13 START (date ‘2013-11-01‘) INCLUSIVE , PARTITION Dec13 START (date ‘2013-12-01‘) INCLUSIVE END (date ‘2014-01-01‘) EXCLUSIVE ); |
通过pgAdmin查看创建后的表如下所示:
2) 定义数字范围分区表,使用单个数字作为分区表
CREATE TABLE tb_cp_02 (id int, rank int, year int, gender char(1), count int) DISTRIBUTED BY (id) PARTITION BY RANGE (year) ( START (2010) END (2014) EVERY (1), DEFAULT PARTITION extra ); |
通过pgAdmin查看创建后的表如下所示:
3) 创建列表分区表,可以使用任何数据类型的列作为分区键,也可以使用多个列组合作为分区键;主键或者唯一约束必须包含表中的所有分区键。
CREATE TABLE tb_cp_04 (id int, rank int, year int, gender char(1), count int ) DISTRIBUTED BY (id) PARTITION BY LIST (gender) ( PARTITION girls VALUES (‘F‘), PARTITION boys VALUES (‘M‘), DEFAULT PARTITION other ); |
通过pgAdmin查看创建后的表如下所示:
4) 定义多级分区表:当需要子分区的时候
----以地区作为子分区 CREATE TABLE tb_cp_05 (trans_id int, date date, amount decimal(9,2), region text) DISTRIBUTED BY (trans_id) PARTITION BY RANGE (date) SUBPARTITION BY LIST (region) SUBPARTITION TEMPLATE ( SUBPARTITION usa VALUES (‘usa‘), SUBPARTITION europe VALUES (‘europe‘), DEFAULT SUBPARTITION other_regions) (START (date ‘2013-09-01‘) INCLUSIVE END (date ‘2014-01-01‘) EXCLUSIVE EVERY (INTERVAL ‘1 month‘), DEFAULT PARTITION outlying_dates ); |
通过pgAdmin查看创建后的表如下所示:
5) 创建三级子分区表,被分区为年、月、区域三层
CREATE TABLE tb_cp_06 (id int, year int, month int, day int, region text) DISTRIBUTED BY (id) PARTITION BY RANGE (year) SUBPARTITION BY RANGE (month) SUBPARTITION TEMPLATE ( START (1) END (3) EVERY (1), DEFAULT SUBPARTITION other_months ) SUBPARTITION BY LIST (region) SUBPARTITION TEMPLATE ( SUBPARTITION usa VALUES (‘usa‘), SUBPARTITION europe VALUES (‘europe‘), DEFAULT SUBPARTITION other_regions ) ( START (2012) END (2014) EVERY (1), DEFAULT PARTITION outlying_years ); |
1) 分区表中顶级表是空的,数据存储在最底层的表中。
2) 为避免数据装载失败,可选择定义默认分区
3) 查询分区表时,默认分区总是会被扫描,如果默认分区包含数据,会影响查询效率。
4) 在使用COPY或者INSERT向父级表装载数据时,数据会自动路由到正确的分区。
5) 可考虑交换分区的方法直接转载数据到子表,提高性能。
如果查询计划显示分区表没有被选择性的扫描,可能和一下的限制有关:
1) 查询计划仅可以对比较稳定的比较运算符,比如:=,<.<=,>,>=
2) 查询计划不识别非稳定函数来执行选择性扫描。
比如,WHERE子句中使用如date>CURRENT_DATE会使查询计划执行分区扫描,而time>TIMEOFDAY不会
SELECT partitionboundary, partitiontablename, partitionname, partitionlevel, partitionrank FROM pg_partitions WHERE tablename = ‘?’; |
SELECT * FROM pg_partition_templates;----查看创建该(子)分区的模板 |
SELECT * FROM pg_partition_columns;----查看分区表的分区键 |
1) 添加新分区
使用ALTER TABLE命令在存在的分区表上添加新的分区,如果存在默认分区,只能从默认分区中拆分新的分区。
--原分区表包含subpartition template设计 ALTER TABLE tb_cp_05 DROP DEFAULT PARTITION; ALTER TABLE tb_cp_05 ADD PARTITION START (date ‘2014-01-01‘) INCLUSIVE END (date ‘2014-02-01‘) EXCLUSIVE; --原分区不包含subpartition template设计: ALTER TABLE tb_cp_05 ADD PARTITION START (date ‘2014-02-01‘) INCLUSIVE END (date ‘2014-03-01‘) EXCLUSIVE ( SUBPARTITION usa VALUES (‘usa‘), SUBPARTITION asia VALUES (‘asia‘), SUBPARTITION europe VALUES (‘europe‘) ); |
2) 重命名分区
GP中的对象长度限制为63个字符,并且受唯一性的约束。
子表的名称格式:<父表名称>_<分区层级>_prt_<分区名称>,例如:tb_cp_04_1_prt_boys
对于未指定分区名称而自动产生的范围分区表:比如tb_cp_05_1_prt_5,修改父表名称,将会影响所有分区表:ALTER TABLE tb_cp_05 rename to tbcp05;则对应分区表将会改为:tbcp05_1_prt_5
只修改分区名称:ALTER TABLE tbcp05 RENAME PARTITION FOR(‘2013-06-01‘) TO Jun13;对应分区表将会改为:tbcp05_1_prt_jun13
3) 删除分区:使用ALTER TABLE命令删除分区表中的分区。
删除默认分区: ALTER TABLE tbcp05 DROP DEFAULT PARTITION;
对于多级分区表,为同一层每一个分区删除默认分区:
ALTER TABLEtb_cp_06 ALTER PARTITION FOR (RANK(1)) DROP DEFAULT PARTITION other;
ALTER TABLE tb_cp_06 ALTER PARTITION FOR (RANK(2)) DROP DEFAULTPARTITION other;:
4) 添加缺省分区:使用ALTER TABLE命令添加默认分区:ALTER TABLE tbcp05 ADD DEFAULT PARTITION other;
如果是多级分区表,同一层每个分区都需要默认分区:
ALTER TABLE tb_cp_06 ALTER PARTITION FOR (RANK(1)) ADD DEFAULT PARTITION other;
ALTER TABLEtb_cp_06 ALTER PARTITION FOR (RANK(2)) ADD DEFAULT PARTITION other;
5) 清空分区数据:使用ALTER TABLE命令来清空分区。ALTER TABLE TABLE_NAME TRUNCATE PARTITION PRATITION_NAME.
6) 交换分区:交换分区是用一个普通的TABLE与现有的分区交换身份。使用ALTER TABLE命令来交换分区。只能交换最低层次的分区表。
CREATE TABLE jan13(LIKE tb_cp_02) WITH(appendonly=true);
INSERT INTOjan13 VALUES(1,‘2013-01-15‘,123.45);
ALTER TABLE tb_cp_02 EXCHANGE PARTITION for(date ‘2013-01-01‘) WITH TABLEjan13;
7) 拆分分区:使用ALTER TABLE命令将现有的一个分区拆分成两个。
例如:将一个月分区数据拆分到一个1-15日的分区和另一个16-31日的分区
ALTER TABLE tb_cp_02 SPLIT PARTITIONFOR(‘2013-01-01‘) AT (‘2013-01-16‘)INTO (PARTITION jan131to15, PARTITIONjan0816to31);
如果分区表有默认分区,要添加新分区只能从默认分区拆分:
ALTER TABLE tb_cp_03 SPLIT DEFAULT PARTITION START (2014) INCLUSIVE END (2015)EXCLUSIVE INTO (PARTITION y2014, DEFAULT PARTITION);
8) 修改子分区模板:使用ALTER TABLE SET SUBPARTITION TEMPLATE命令来修改现在分区表的子分区模板。
Greenplum+Hadoop学习笔记-14-定义数据库对象之创建与管理表
标签:
原文地址:http://blog.csdn.net/mavs41/article/details/44907407