在数据仓库中,有事实表、维度表两个概念。
事实表是数据仓库结构中的中央表,它包含联系事实与维度表的数字度量值和键。事实数据表包含描述业务(例如产品销售)内特定事件的数据。
维度表是维度属性的集合。是分析问题的一个窗口。是人们观察数据的特定角度,是考虑问题时的一类属性,属性的集合构成一个维。
如图示
我们以sh用户下的sales表和times表来看,
SALES为事实表
SQL> desc sales Name Null? Type ----------------------------------------- -------- ---------------------------- PROD_ID NOT NULL NUMBER CUST_ID NOT NULL NUMBER TIME_ID NOT NULL DATE CHANNEL_ID NOT NULL NUMBER PROMO_ID NOT NULL NUMBER QUANTITY_SOLD NOT NULL NUMBER(10,2) AMOUNT_SOLD NOT NULL NUMBER(10,2)
TIMES为维度表
SQL> desc times Name Null? Type ----------------------------------------- -------- ---------------------------- TIME_ID NOT NULL DATE DAY_NAME NOT NULL VARCHAR2(9) DAY_NUMBER_IN_WEEK NOT NULL NUMBER(1) DAY_NUMBER_IN_MONTH NOT NULL NUMBER(2) CALENDAR_WEEK_NUMBER NOT NULL NUMBER(2) FISCAL_WEEK_NUMBER NOT NULL NUMBER(2) WEEK_ENDING_DAY NOT NULL DATE WEEK_ENDING_DAY_ID NOT NULL NUMBER CALENDAR_MONTH_NUMBER NOT NULL NUMBER(2) FISCAL_MONTH_NUMBER NOT NULL NUMBER(2) CALENDAR_MONTH_DESC NOT NULL VARCHAR2(8) CALENDAR_MONTH_ID NOT NULL NUMBER FISCAL_MONTH_DESC NOT NULL VARCHAR2(8) FISCAL_MONTH_ID NOT NULL NUMBER DAYS_IN_CAL_MONTH NOT NULL NUMBER DAYS_IN_FIS_MONTH NOT NULL NUMBER END_OF_CAL_MONTH NOT NULL DATE END_OF_FIS_MONTH NOT NULL DATE CALENDAR_MONTH_NAME NOT NULL VARCHAR2(9) FISCAL_MONTH_NAME NOT NULL VARCHAR2(9) CALENDAR_QUARTER_DESC NOT NULL CHAR(7) CALENDAR_QUARTER_ID NOT NULL NUMBER FISCAL_QUARTER_DESC NOT NULL CHAR(7) FISCAL_QUARTER_ID NOT NULL NUMBER DAYS_IN_CAL_QUARTER NOT NULL NUMBER DAYS_IN_FIS_QUARTER NOT NULL NUMBER END_OF_CAL_QUARTER NOT NULL DATE END_OF_FIS_QUARTER NOT NULL DATE CALENDAR_QUARTER_NUMBER NOT NULL NUMBER(1) FISCAL_QUARTER_NUMBER NOT NULL NUMBER(1) CALENDAR_YEAR NOT NULL NUMBER(4) CALENDAR_YEAR_ID NOT NULL NUMBER FISCAL_YEAR NOT NULL NUMBER(4) FISCAL_YEAR_ID NOT NULL NUMBER DAYS_IN_CAL_YEAR NOT NULL NUMBER DAYS_IN_FIS_YEAR NOT NULL NUMBER END_OF_CAL_YEAR NOT NULL DATE END_OF_FIS_YEAR NOT NULL DATE
如果我们创建一个物化视图
create materialized view sales_month_sum enable query rewrite as SELECT t.calendar_month_id, prod_id, channel_id, promo_id, SUM (quantity_sold) quantity_sold, SUM (amount_sold) amount_sold FROM sales s, times t WHERE s.time_id = t.time_id GROUP BY prod_id, channel_id, promo_id, t.calendar_month_id;
如果我们做如下按月的分组查询
SQL> alter session set query_rewrite_enabled=true; SQL> alter session set query_rewrite_integrity=trusted; SQL> set autotrace traceonly SQL> set line 200 SQL> SELECT t.calendar_month_id, prod_id, channel_id, promo_id, SUM (quantity_sold) quantity_sold, SUM (amount_sold) amount_sold FROM sales s, times t WHERE s.time_id = t.time_id GROUP BY prod_id, channel_id, promo_id, t.calendar_month_id; 9068 rows selected. Execution Plan ---------------------------------------------------------- Plan hash value: 3287305789 ------------------------------------------------------------------------------------------------ | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | ------------------------------------------------------------------------------------------------ | 0 | SELECT STATEMENT | | 9068 | 690K| 13 (0)| 00:00:01 | | 1 | MAT_VIEW REWRITE ACCESS FULL| SALES_MONTH_SUM | 9068 | 690K| 13 (0)| 00:00:01 | ------------------------------------------------------------------------------------------------
可见查询使用的是物化视图,但是如果我需要按年、季度对数据做分组查询呢?
SELECT t.calendar_quarter_id,prod_id, channel_id, promo_id, SUM (quantity_sold) quantity_sold, SUM (amount_sold) amount_sold FROM sales s, times t WHERE s.time_id = t.time_id GROUP BY prod_id, channel_id, promo_id, t.calendar_quarter_id;
这个查看肯定是不能使用物化视图的,执行计划如下
Execution Plan ---------------------------------------------------------- Plan hash value: 3221963832 --------------------------------------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | Pstart| Pstop | --------------------------------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 2037 | 79443 | 569 (6)| 00:00:07 | | | | 1 | HASH GROUP BY | | 2037 | 79443 | 569 (6)| 00:00:07 | | | |* 2 | HASH JOIN | | 918K| 34M| 546 (2)| 00:00:07 | | | | 3 | PART JOIN FILTER CREATE | :BF0000 | 1826 | 21912 | 18 (0)| 00:00:01 | | | | 4 | TABLE ACCESS FULL | TIMES | 1826 | 21912 | 18 (0)| 00:00:01 | | | | 5 | PARTITION RANGE JOIN-FILTER| | 918K| 23M| 525 (2)| 00:00:07 |:BF0000|:BF0000| | 6 | TABLE ACCESS FULL | SALES | 918K| 23M| 525 (2)| 00:00:07 |:BF0000|:BF0000| ---------------------------------------------------------------------------------------------------------
Oracle为了是查询重写更加的智能,引入了Dimension的概念。Dimension我们称之为维,它是基于维度表的,用来描述维度表的维度之间的层级关系。
CREATE DIMENSION SH.TIMES_DIM LEVEL DAY IS (SH.TIMES.TIME_ID) LEVEL MONTH IS (SH.TIMES.CALENDAR_MONTH_ID) LEVEL QUARTER IS (SH.TIMES.CALENDAR_QUARTER_ID) LEVEL YEAR IS (SH.TIMES.CALENDAR_YEAR_ID) HIERARCHY CAL_ROLLUP (DAY CHILD OF MONTH CHILD OF QUARTER CHILD OF YEAR);
LEVEL定义等级,基于维度表,HIERARCHY关键字定义层级关系。由层级关系,我们知道quarter是由month组成的。
我们再次查询
SQL> SELECT t.calendar_quarter_id,prod_id, channel_id, promo_id, SUM (quantity_sold) quantity_sold, SUM (amount_sold) amount_sold FROM sales s, times t WHERE s.time_id = t.time_id GROUP BY prod_id, channel_id, promo_id, t.calendar_quarter_id; 3375 rows selected. Execution Plan ---------------------------------------------------------- Plan hash value: 3397140165 -------------------------------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | -------------------------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 20 | 1720 | 36 (14)| 00:00:01 | | 1 | HASH GROUP BY | | 20 | 1720 | 36 (14)| 00:00:01 | |* 2 | HASH JOIN | | 128K| 10M| 33 (7)| 00:00:01 | | 3 | VIEW | | 849 | 6792 | 19 (6)| 00:00:01 | | 4 | HASH UNIQUE | | 849 | 6792 | 19 (6)| 00:00:01 | | 5 | TABLE ACCESS FULL | TIMES | 1826 | 14608 | 18 (0)| 00:00:01 | | 6 | MAT_VIEW REWRITE ACCESS FULL| SALES_MONTH_SUM | 9068 | 690K| 13 (0)| 00:00:01 | --------------------------------------------------------------------------------------------------
这次是使用物化视图与times表做关联,性能更高了。
我们对比如下两个查询
SQL> SELECT t.calendar_quarter_id, prod_id, channel_id, promo_id, SUM (quantity_sold) quantity_sold, SUM (amount_sold) amount_sold FROM sales s, times t WHERE s.time_id = t.time_id AND t.calendar_quarter_id = 1769 GROUP BY prod_id, channel_id, promo_id, t.calendar_quarter_id; 168 rows selected. Execution Plan ---------------------------------------------------------- Plan hash value: 3397140165 -------------------------------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | -------------------------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 1 | 86 | 33 (7)| 00:00:01 | | 1 | HASH GROUP BY | | 1 | 86 | 33 (7)| 00:00:01 | |* 2 | HASH JOIN | | 6423 | 539K| 32 (4)| 00:00:01 | | 3 | VIEW | | 34 | 272 | 19 (6)| 00:00:01 | | 4 | HASH UNIQUE | | 34 | 272 | 19 (6)| 00:00:01 | |* 5 | TABLE ACCESS FULL | TIMES | 90 | 720 | 18 (0)| 00:00:01 | | 6 | MAT_VIEW REWRITE ACCESS FULL| SALES_MONTH_SUM | 9068 | 690K| 13 (0)| 00:00:01 | --------------------------------------------------------------------------------------------------
使用了物化视图
SQL>SELECT t.calendar_quarter_id, prod_id, channel_id, promo_id, SUM (quantity_sold) quantity_sold, SUM (amount_sold) amount_sold FROM sales s, times t WHERE s.time_id = t.time_id AND t.calendar_quarter_desc = ‘1998-01‘ GROUP BY prod_id, channel_id, promo_id, t.calendar_quarter_id; 168 rows selected. Execution Plan ---------------------------------------------------------- Plan hash value: 3221963832 ----------------------------------------------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes |TempSpc| Cost (%CPU)| Time | Pstart| Pstop | ----------------------------------------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 8146 | 373K| | 848 (2)| 00:00:11 | | | | 1 | HASH GROUP BY | | 8146 | 373K| 3632K| 848 (2)| 00:00:11 | | | |* 2 | HASH JOIN | | 57459 | 2637K| | 546 (2)| 00:00:07 | | | | 3 | PART JOIN FILTER CREATE | :BF0000 | 91 | 1820 | | 18 (0)| 00:00:01 | | | |* 4 | TABLE ACCESS FULL | TIMES | 91 | 1820 | | 18 (0)| 00:00:01 | | | | 5 | PARTITION RANGE JOIN-FILTER| | 918K| 23M| | 525 (2)| 00:00:07 |:BF0000|:BF0000| | 6 | TABLE ACCESS FULL | SALES | 918K| 23M| | 525 (2)| 00:00:07 |:BF0000|:BF0000| -----------------------------------------------------------------------------------------------------------------
没有使用物化视图。
其实条件实质上是一样的,因为t.calendar_quarter_desc = ‘1998-01‘ 和t.calendar_quarter_id = 1769 在times表中表示相同的数据。
但是Oracle不知道CALENDAR_QUARTER_DESC与CALENDAR_QUARTER_ID的关系。
我们在创建Dimension时,可以为LEVEL指定属性值。
如下
CREATE DIMENSION SH.TIMES_DIM LEVEL DAY IS (SH.TIMES.TIME_ID) LEVEL MONTH IS (SH.TIMES.CALENDAR_MONTH_ID) LEVEL QUARTER IS (SH.TIMES.CALENDAR_QUARTER_ID) LEVEL YEAR IS (SH.TIMES.CALENDAR_YEAR_ID) HIERARCHY CAL_ROLLUP (DAY CHILD OF MONTH CHILD OF QUARTER CHILD OF YEAR) ATTRIBUTE QUARTER DETERMINES (SH.TIMES.CALENDAR_QUARTER_DESC, SH.TIMES.DAYS_IN_CAL_QUARTER, SH.TIMES.END_OF_CAL_QUARTER, SH.TIMES.CALENDAR_QUARTER_NUMBER) ATTRIBUTE YEAR DETERMINES (SH.TIMES.CALENDAR_YEAR, SH.TIMES.DAYS_IN_CAL_YEAR, SH.TIMES.END_OF_CAL_YEAR);
我们再次查询
SQL> SELECT t.calendar_quarter_id, prod_id, channel_id, promo_id, SUM (quantity_sold) quantity_sold, SUM (amount_sold) amount_sold FROM sales s, times t WHERE s.time_id = t.time_id AND t.calendar_quarter_desc = ‘1998-01‘ GROUP BY prod_id, channel_id, promo_id, t.calendar_quarter_id; 168 rows selected. Execution Plan ---------------------------------------------------------- Plan hash value: 3290467316 -------------------------------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | -------------------------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 20 | 2240 | 33 (7)| 00:00:01 | | 1 | HASH GROUP BY | | 20 | 2240 | 33 (7)| 00:00:01 | |* 2 | HASH JOIN | | 17191 | 1880K| 32 (4)| 00:00:01 | | 3 | VIEW | VW_GBF_5 | 91 | 3094 | 19 (6)| 00:00:01 | | 4 | HASH GROUP BY | | 91 | 728 | 19 (6)| 00:00:01 | | 5 | VIEW | | 91 | 728 | 19 (6)| 00:00:01 | | 6 | HASH UNIQUE | | 91 | 1456 | 19 (6)| 00:00:01 | |* 7 | TABLE ACCESS FULL | TIMES | 91 | 1456 | 18 (0)| 00:00:01 | | 8 | MAT_VIEW REWRITE ACCESS FULL| SALES_MONTH_SUM | 9068 | 690K| 13 (0)| 00:00:01 | --------------------------------------------------------------------------------------------------
这次就使用了物化视图。
本文出自 “叮咚” 博客,请务必保留此出处http://lqding.blog.51cto.com/9123978/1694413
Oracle 学习之 数据仓库(二) Dimension 的理解
原文地址:http://lqding.blog.51cto.com/9123978/1694413