标签:gac 信息 code key 单位 格式化 情况 bucket 上下文
即便对SELECT等数据库查询语句已经很熟悉了,但HAWQ里的查询有其自己的特点,还是需要研究一下。SELECT customer, amount FROM sales JOIN customer USING (cust_id) WHERE dateCol = ‘04-30-2016‘;图2显示了为该查询生成的三个slice。每个segment接收一份查询计划的拷贝,查询计划在多个segment上并行工作。
db1=# create table sales (order_id int, item_id int, amount numeric(15,2), date date, yr_qtr int) db1-# partition by range (yr_qtr) db1-# ( partition p201701 start (201701) inclusive , db1(# partition p201702 start (201702) inclusive , db1(# partition p201703 start (201703) inclusive , db1(# partition p201704 start (201704) inclusive , db1(# partition p201705 start (201705) inclusive , db1(# partition p201706 start (201706) inclusive , db1(# partition p201707 start (201707) inclusive , db1(# partition p201708 start (201708) inclusive , db1(# partition p201709 start (201709) inclusive , db1(# partition p201710 start (201710) inclusive , db1(# partition p201711 start (201711) inclusive , db1(# partition p201712 start (201712) inclusive db1(# end (201801) exclusive ); ... CREATE TABLE db1=#GPORCA改进了分区表上以下类型的查询:
db1=# explain select * from sales; QUERY PLAN ------------------------------------------------------------------------------------------------------ Gather Motion 1:1 (slice1; segments: 1) (cost=0.00..431.00 rows=1 width=24) -> Sequence (cost=0.00..431.00 rows=1 width=24) -> Partition Selector for sales (dynamic scan id: 1) (cost=10.00..100.00 rows=100 width=4) Partitions selected: 12 (out of 12) -> Dynamic Table Scan on sales (dynamic scan id: 1) (cost=0.00..431.00 rows=1 width=24) Settings: default_hash_table_bucket_number=24 Optimizer status: PQO version 1.684 (7 rows)
db1=# explain select * from sales where yr_qtr = 201706; QUERY PLAN ------------------------------------------------------------------------------------------------------ Gather Motion 1:1 (slice1; segments: 1) (cost=0.00..431.00 rows=1 width=24) -> Sequence (cost=0.00..431.00 rows=1 width=24) -> Partition Selector for sales (dynamic scan id: 1) (cost=10.00..100.00 rows=100 width=4) Filter: yr_qtr = 201706 Partitions selected: 1 (out of 12) -> Dynamic Table Scan on sales (dynamic scan id: 1) (cost=0.00..431.00 rows=1 width=24) Filter: yr_qtr = 201706 Settings: default_hash_table_bucket_number=24 Optimizer status: PQO version 1.684 (9 rows)
db1=# explain select * from sales where yr_qtr between 201701 and 201704 ; QUERY PLAN ------------------------------------------------------------------------------------------------------ Gather Motion 1:1 (slice1; segments: 1) (cost=0.00..431.00 rows=1 width=24) -> Sequence (cost=0.00..431.00 rows=1 width=24) -> Partition Selector for sales (dynamic scan id: 1) (cost=10.00..100.00 rows=100 width=4) Filter: yr_qtr >= 201701 AND yr_qtr <= 201704 Partitions selected: 4 (out of 12) -> Dynamic Table Scan on sales (dynamic scan id: 1) (cost=0.00..431.00 rows=1 width=24) Filter: yr_qtr >= 201701 AND yr_qtr <= 201704 Settings: default_hash_table_bucket_number=24 Optimizer status: PQO version 1.684 (9 rows)
db1=# explain select * from sales where yr_qtr = (select 201701); QUERY PLAN ------------------------------------------------------------------------------------------------------ Hash Join (cost=0.00..431.00 rows=1 width=24) Hash Cond: "outer"."?column?" = sales.yr_qtr -> Result (cost=0.00..0.00 rows=1 width=4) -> Result (cost=0.00..0.00 rows=1 width=1) -> Hash (cost=431.00..431.00 rows=1 width=24) -> Gather Motion 1:1 (slice1; segments: 1) (cost=0.00..431.00 rows=1 width=24) -> Sequence (cost=0.00..431.00 rows=1 width=24) -> Partition Selector for sales (dynamic scan id: 1) (cost=10.00..100.00 rows=100 width=4) Partitions selected: 12 (out of 12) -> Dynamic Table Scan on sales (dynamic scan id: 1) (cost=0.00..431.00 rows=1 width=24) Settings: default_hash_table_bucket_number=24; optimizer=on Optimizer status: PQO version 1.684 (12 rows)
SELECT * FROM part WHERE price > (SELECT avg(price) FROM part);GPORCA也能高效处理相关子查询(correlated subquery,CSQ)。相关子查询在子查询中引用了外层查询的值,如下面的例子。
SELECT * FROM part p1 WHERE price > (SELECT avg(price) FROM part p2 WHERE p2.brand = p1.brand);GPORCA为下面类型的相关子查询生成更有效的查询计划:
SELECT *, (SELECT min(price) FROM part p2 WHERE p1.brand = p2.brand) AS foo FROM part p1;
SELECT * FROM part p1 WHERE p_size > 40 OR p_retailprice > (SELECT avg(p_retailprice) FROM part p2 WHERE p2.p_brand = p1.p_brand);
SELECT * FROM part p1 WHERE p1.p_partkey IN (SELECT p_partkey FROM part p2 WHERE p2.p_retailprice = (SELECT min(p_retailprice) FROM part p3 WHERE p3.p_brand = p1.p_brand));
SELECT * FROM part p1 WHERE p1.p_retailprice = (SELECT min(p_retailprice) FROM part p2 WHERE p2.p_brand <> p1.p_brand);
SELECT p_partkey, (SELECT p_retailprice FROM part p2 WHERE p2.p_brand = p1.p_brand ) FROM part p1;
db1=# create table t (a int,b int,c int); CREATE TABLE db1=# insert into t values (1,1,1), (2,2,2); INSERT 0 2 db1=# with v as (select a, sum(b) as s from t where c < 10 group by a) db1-# select * from v as v1 , v as v2 db1-# where v1.a <> v2.a and v1.s < v2.s; a | s | a | s ---+---+---+--- 1 | 1 | 2 | 2 (1 row)作为查询优化的一部分,GPORCA能将谓词过滤条件下推至CTE,如下面的查询。
db1=# explain db1-# with v as (select a, sum(b) as s from t group by a) db1-# select * db1-# from v as v1, v as v2, v as v3 db1-# where v1.a < v2.a db1-# and v1.s < v3.s db1-# and v1.a = 10 db1-# and v2.a = 20 db1-# and v3.a = 30; QUERY PLAN ------------------------------------------------------------------------- ... -> Table Scan on t (cost=0.00..431.00 rows=2 width=8) Filter: a = 10 OR a = 20 OR a = 30 ... Settings: default_hash_table_bucket_number=24 Optimizer status: PQO version 1.684 (34 rows)GPORCA可以处理以下类型的CTE:
db1=# with cte1 as (select a, sum(b) as s from t db1(# where c < 10 group by a), db1-# cte2 as (select a, s from cte1 where s > 1) db1-# select * db1-# from cte1 as v1, cte2 as v2, cte2 as v3 db1-# where v1.a < v2.a and v1.s < v3.s; a | s | a | s | a | s ---+---+---+---+---+--- 1 | 1 | 2 | 2 | 2 | 2 (1 row)
db1=# with v as (with w as (select a, b from t db1(# where b < 5) db1(# select w1.a, w2.b db1(# from w as w1, w as w2 db1(# where w1.a = w2.a and w1.a > 1) db1-# select v1.a, v2.a, v2.b db1-# from v as v1, v as v2 db1-# where v1.a <= v2.a; a | a | b ---+---+--- 2 | 2 | 2 (1 row)
db1=# drop table t; DROP TABLE db1=# create table t (a int not null, b int, c int); CREATE TABLE db1=# explain insert into t values (1,1,1); QUERY PLAN ------------------------------------------------------------------ Insert (cost=0.00..0.08 rows=1 width=12) -> Result (cost=0.00..0.00 rows=1 width=20) -> Assert (cost=0.00..0.00 rows=1 width=20) Assert Cond: NOT a IS NULL -> Result (cost=0.00..0.00 rows=1 width=20) -> Result (cost=0.00..0.00 rows=1 width=1) Settings: default_hash_table_bucket_number=24 Optimizer status: PQO version 1.684 (8 rows)
db1=# explain select count(distinct b) from t; QUERY PLAN ---------------------------------------------------------------------------------------------------------- Aggregate (cost=0.00..431.00 rows=1 width=8) -> Gather Motion 1:1 (slice2; segments: 1) (cost=0.00..431.00 rows=2 width=4) -> GroupAggregate (cost=0.00..431.00 rows=2 width=4) Group By: b -> Sort (cost=0.00..431.00 rows=2 width=4) Sort Key: b -> Redistribute Motion 1:1 (slice1; segments: 1) (cost=0.00..431.00 rows=2 width=4) Hash Key: b -> GroupAggregate (cost=0.00..431.00 rows=2 width=4) Group By: b -> Sort (cost=0.00..431.00 rows=2 width=4) Sort Key: b -> Table Scan on t (cost=0.00..431.00 rows=2 width=4) Settings: default_hash_table_bucket_number=24 Optimizer status: PQO version 1.684 (15 rows)optimizer_prefer_scalar_dqa_multistage_agg配置参数控制处理DQA的行为,该参数缺省是启用的。
[gpadmin@hdp3 ~]$ hawq config -s optimizer_prefer_scalar_dqa_multistage_agg GUC : optimizer_prefer_scalar_dqa_multistage_agg Value : on [gpadmin@hdp3 ~]$启用该参数会强制GPORCA使用三阶段DQA计划,保证DQA查询具有可预测的性能。如果禁用该参数,则GPORCA使用基于成本的方法生成执行计划。
[gpadmin@hdp3 ~]$ source /usr/local/hawq/greenplum_path.sh
[gpadmin@hdp3 ~]$ hawq config -c optimizer_analyze_root_partition -v on
[gpadmin@hdp3 ~]$ hawq stop cluster -u
[gpadmin@hdp3 ~]$ source /usr/local/hawq/greenplum_path.sh
[gpadmin@hdp3 ~]$ hawq config -c optimizer -v on
[gpadmin@hdp3 ~]$ hawq stop cluster -u
db1=# alter database db1 set optimizer = on ; ALTER DATABASE(4)在会话级启用GPORCA
db1=# set optimizer = on ; SET为特定查询指定GPORCA优化器时,在运行查询前执行该set命令。
db1=# explain select * from pg_attribute; QUERY PLAN -------------------------------------------------------------------- Seq Scan on pg_attribute (cost=0.00..62.70 rows=104880 width=103) Settings: default_hash_table_bucket_number=24; optimizer=on Optimizer status: legacy query optimizer (3 rows)
db1=# explain select * from sales where yr_qtr = 201706; QUERY PLAN ------------------------------------------------------------------------------------------------------ ... Settings: default_hash_table_bucket_number=24; optimizer=on Optimizer status: PQO version 1.684 (9 rows)如果查询使用了老的优化器生成的执行计划,输出的最后会显示“legacy query optimizer”。例如:
db1=# explain select 1; QUERY PLAN -------------------------------------------------------------- Result (cost=0.00..0.01 rows=1 width=0) Settings: default_hash_table_bucket_number=24; optimizer=on Optimizer status: legacy query optimizer (3 rows)下面的操作只会出现在GPORCA生成的执行计划中,老的优化器不支持这些操作。
Minidump_date_time.mdp下面看一个生成minidump文件的例子。
[gpadmin@hdp3 ~]$ psql -d db1 psql (8.2.15) Type "help" for help. db1=# set optimizer_minidump=always; SET
db1=# select * from t; a | b | c ---+---+--- 1 | 1 | 1 1 | 2 | 2 (2 rows)
[gpadmin@hdp3 ~]$ ls -l /data/hawq/master/minidumps/ 总用量 12 -rw------- 1 gpadmin gpadmin 8949 4月 11 17:07 Minidump_20170411_170712_72720_2.mdp [gpadmin@hdp3 ~]$
[gpadmin@hdp3 ~]$ xmllint --format /data/hawq/master/minidumps/Minidump_20170411_170712_72720_2.mdp > /data/hawq/master/minidumps/MyTest.xml
[gpadmin@hdp3 ~]$ cat /data/hawq/master/minidumps/MyTest.xml
db1=# explain select a, percentile_cont (0.5) within group (order by b desc) db1-# from t group by a; QUERY PLAN -------------------------------------------------------------- ... Settings: default_hash_table_bucket_number=24; optimizer=on Optimizer status: legacy query optimizer (24 rows)
db1=# explain select count(*) from t group by cube(a,b); QUERY PLAN -------------------------------------------------------------- ... Settings: default_hash_table_bucket_number=24; optimizer=on Optimizer status: legacy query optimizer (27 rows)
HAWQ为查询动态分配资源,数据所在的位置、查询所使用的虚拟段数量、集群的总体健康状况等因素都会影响查询性能。
1. 常用优化手段
(1)动态分区消除。[gpadmin@hdp3 ~]$ hawq config -s gp_dynamic_partition_pruning GUC : gp_dynamic_partition_pruning Value : on [gpadmin@hdp3 ~]$(2)内存优化。
[gpadmin@hdp3 ~]$ hawq config -s hawq_re_memory_overcommit_max GUC : hawq_re_memory_overcommit_max Value : 8192 [gpadmin@hdp3 ~]$ hawq config -s runaway_detector_activation_percent GUC : runaway_detector_activation_percent Value : 95 [gpadmin@hdp3 ~]$当一个物理segment使用的虚拟内存数量超过了该阈值,HAWQ就从内存消耗最大的查询开始终止查询,直到虚拟内存的使用低于指定的百分比。假设HAWQ的资源管理器计算得到的一个物理segment的虚拟内存限额为9G,hawq_re_memory_overcommit_max设置为1G,runaway_detector_activation_percent设置为95。那么当虚拟内存使用超过9.5G时,HAWQ开始终止查询。
db1=# explain analyze select * from t; QUERY PLAN ---------------------------------------------------------------------------------------------------------- ... Data locality statistics: data locality ratio: 1.000; virtual segment number: 1; different host number: 1; virtual segment number per host(avg/min/max): (1/1/1); segment size(avg/min/max): (56.000 B/56 B/56 B); segment size with penalty(avg/min/max): (56.000 B/56 B/56 B); continuity(avg/min/max): (1.000/1.000/1.000); DFS metadatacache: 0.138 ms; resource allocation: 1.159 ms; datalocality calculation: 0.252 ms. Total runtime: 8.205 ms (17 rows)表1说明数据本地化相关度量值的含义,使用这些信息可以检查潜在的查询性能问题。
统计项 | 描述 |
data locality ratio | 表示查询总的本地化读取比例。比例越低,从远程节点读取的数据越多。由于远程读取HDFS需要网络IO,可能增加查询的执行时间。对于哈希分布表,一个文件中的所有数据块将由一个segment处理,因此如果HDFS上的数据重新分布,比如做了HDFS Rebalance,那么数据本地化比例将会降低。这种情况下,可以执行CREATE TABLE AS SELECT语句,通过重建表手工执行数据的重新分布。 |
number of virtual segments | 查询使用的虚拟段数量。通常虚拟段数越多,查询执行的越快。如果虚拟段太少,需要检查default_hash_table_bucket_number、hawq_rm_nvseg_perquery_limit或哈希分布表的桶数是否过小。 |
different host number | 表示有多少主机用于运行此查询。当虚拟段数量大于等于HAWQ集群主机总数时,所有主机都应该被使用。对于一个大查询,如果该度量值小于主机数,通常意味着有些主机宕机了。这种情况下,应该执行“select * from gp_segment_configuration”语句检查节点状态。 |
segment size and segment size with penalty | “segment size”表示一个虚拟段处理的数据量(平均/最小/最大),以字节为单位。“segment size with penalty”表示一个虚拟段处理的包含了远程读取的数据量(平均/最小/最大),以字节为单位,远程读取量计算公式为“net_disk_ratio” * block size。包含远程读取的虚拟段应该比只有本地读取的虚拟段处理更少的数据。“net_disk_ratio”配置参数用于测量远程读取比本地读取慢多少,缺省值为1.01。可依据不同的网络环境调整该参数的值。 |
continuity | 间断地读取HDFS文件会引入额外的查找,减慢查询的表扫描,一个较低的continuity值说明文件在DataNode上的分布并不连续。 |
DFS metadatacache | 表示查询元数据缓存的时间。HDFS块信息被HAWQ的DFS Metadata Cache process进程缓存。如果缓存没有命中,该时间会增加。 |
resource allocation | 表示从资源管理器获取资源所花的时间。 |
datalocality calculation | 表示运行将HDFS块分配给虚拟段的算法和计算数据本地化比例的时间。 |
db1=# set optimizer=on; SET db1=# explain select * from t; QUERY PLAN ------------------------------------------------------------------------------- Gather Motion 1:1 (slice1; segments: 1) (cost=0.00..431.00 rows=2 width=12) -> Table Scan on t (cost=0.00..431.00 rows=2 width=12) Settings: default_hash_table_bucket_number=24; optimizer=on Optimizer status: PQO version 1.684 (4 rows) db1=# set optimizer=off; SET db1=# explain select * from t; QUERY PLAN ----------------------------------------------------------------------------- Gather Motion 1:1 (slice1; segments: 1) (cost=0.00..1.02 rows=2 width=12) -> Append-only Scan on t (cost=0.00..1.02 rows=2 width=12) Settings: default_hash_table_bucket_number=24; optimizer=off Optimizer status: legacy query optimizer (4 rows)
db1=# explain select * from t where b=1; QUERY PLAN ------------------------------------------------------------------------------- Gather Motion 1:1 (slice1; segments: 1) (cost=0.00..431.00 rows=1 width=12) -> Table Scan on t (cost=0.00..431.00 rows=1 width=12) Filter: b = 1 Settings: default_hash_table_bucket_number=24; optimizer=on Optimizer status: PQO version 1.684 (5 rows)查询计划的EXPLAIN输出只有5行,其中最后一行表示生成该计划的优化器GPORCA,倒数第二行表示哈希桶数和优化器等基本参数的设置。这两行不属于查询计划树。
db1=# select * from t; a | b | c ---+---+--- (0 rows) db1=# explain insert into t values (1,1,1); QUERY PLAN ---------------------------------------------------------------------------------------- Insert (slice0; segments: 1) (rows=1 width=0) -> Redistribute Motion 1:1 (slice1; segments: 1) (cost=0.00..0.01 rows=1 width=0) -> Result (cost=0.00..0.01 rows=1 width=0) Settings: default_hash_table_bucket_number=24; optimizer=off Optimizer status: legacy query optimizer (5 rows) db1=# select * from t; a | b | c ---+---+--- (0 rows) db1=# explain analyze insert into t values (1,1,1); QUERY PLAN ---------------------------------------------------------------------------------------- Insert (slice0; segments: 1) (rows=1 width=0) -> Redistribute Motion 1:1 (slice1; segments: 1) (cost=0.00..0.01 rows=1 width=0) Rows out: Avg 1.0 rows x 1 workers at destination. Max/Last(seg0:hdp3/seg0:hdp3) 1/1 rows with 14/14 ms to end , start offset by 161/161 ms. -> Result (cost=0.00..0.01 rows=1 width=0) Rows out: Avg 1.0 rows x 1 workers. Max/Last(seg0:hdp3/seg0:hdp3) 1/1 rows with 0.004/0.004 ms to first row, 0.005/0.005 ms to end, start offset by 176/176 ms. ... Total runtime: 210.536 ms (18 rows) db1=# select * from t; a | b | c ---+---+--- 1 | 1 | 1 (1 row)EXPLAIN ANALYZE显示优化器的估算成本与查询的实际执行成本,因此可以分析估算与实际的接近程度。EXPLAIN ANALYZE的输出还显示如下内容:
Work_mem used: 64K bytes avg, 64K bytes max (seg0). Work_mem wanted: 90K bytes avg, 90K byes max (seg0) to lessen workfile I/O affecting 2 workers.
db1=# explain analyze select * from t where b=1; QUERY PLAN ---------------------------------------------------------------------------------------- Gather Motion 1:1 (slice1; segments: 1) (cost=0.00..431.00 rows=1 width=12) Rows out: Avg 1.0 rows x 1 workers at destination. Max/Last(seg-1:hdp3/seg-1:hdp3) 1/1 rows with 11/11 ms to end, start offset by 1.054/1.054 ms. -> Table Scan on t (cost=0.00..431.00 rows=1 width=12) Filter: b = 1 Rows out: Avg 1.0 rows x 1 workers. Max/Last(seg0:hdp3/seg0:hdp3) 1/1 rows with 2.892/2.892 ms to first row, 2.989/2.989 ms to end, start offset by 8.579/8.579 ms. Slice statistics: (slice0) Executor memory: 163K bytes. (slice1) Executor memory: 279K bytes (seg0:hdp3). Statement statistics: Memory used: 262144K bytes Settings: default_hash_table_bucket_number=24; optimizer=on Optimizer status: PQO version 1.684 Dispatcher statistics: executors used(total/cached/new connection): (1/1/0); dispatcher time(total/connection/dispatch data): (0.342 ms/0.000 ms/0.095 ms). dispatch data time(max/min/avg): (0.095 ms/0.095 ms/0.095 ms); consume executor data time(max/min/avg): (0.020 ms/0.020 ms/0.020 ms); free executor time(max/min/avg): (0.000 ms/0.000 ms/0.000 ms). Data locality statistics: data locality ratio: 1.000; virtual segment number: 1; different host number: 1; virtual segment number per host(avg/min/max): (1/1/1); segment size(avg/min/max): (24.000 B/24 B/24 B); segment size with penalty(avg/min/max): (24.000 B/24 B/24 B); continuity(avg/min/max): (1.000/1.000/1.000); DFS metadatacache: 0.092 ms; resource allocation: 0.911 ms; datalocality calculation: 0.221 ms. Total runtime: 13.304 ms (18 rows)与EXPLAIN不同相比,这次的输出长得多,有18行。第11表示生成该计划的优化器GPORCA,第12行表示哈希桶数和优化器等基本参数的设置。这两行与EXPLAIN的输出相同。前5行是执行计划树,比EXPLAIN的输出多出第2、5两行,这两行是节点的实际执行情况,包括返回数据行数、首末行时间、最大最长segment等。Table Scan操作只有一个segment(seg0)返回行,并且只返回1行。Max/Last统计是相同的,因为只有一个segment返回行。找到首行使用的时间为2.892毫秒,返回所有行的时间为2.989毫秒。注意start offset by,它表示的是从分发器开始执行操作到segment返回首行经历的时间为8.579毫秒。查询实际返回行数与估算返回的行数相同。gather motion操作接收1行,并传送到master。gather motion节点的时间统计包含其了子节点Table Scan操作的时间。最后一行显示该查询总的执行时间为13.304毫秒。
Work_mem used: 23430K bytes avg, 23430K bytes max (seg0). Work_mem wanted: 33649K bytes avg, 33649K bytes max (seg0) to lessen workfile I/O affecting 2 workers.
db1=# create or replace function explain_plan_func() returns varchar as $$ declare a varchar; b int; c varchar; begin a = ‘‘; b = 1; for c in execute ‘explain select * from t where b=‘ || cast(b as varchar) loop a = a || e‘\n‘ || c; end loop; return a; end; $$ language plpgsql volatile; CREATE FUNCTION db1=# select explain_plan_func(); explain_plan_func ---------------------------------------------------- Seq Scan on t (cost=0.00..34.25 rows=10 width=12) Filter: (b = 1) (1 行记录)
标签:gac 信息 code key 单位 格式化 情况 bucket 上下文
原文地址:http://blog.csdn.net/wzy0623/article/details/70167990