标签:
经历了前两轮优化之后,saiku由不可使用,优化到可以使用,不过在分析大量日志数据的时候,还有顿卡的感觉!继续观察背后执行的Sql,决定将注意力关注到索引上面!
日志的主要使用场景是:固定日期维度的数据分析,也就是说where条件一定跟着日期等于某一天,那么纠结的是:每个字段都建立索引,还是和日期建立联合索引。归结到底就是单个字段的索引效率与联合索引的效率优劣对比!
Postgresql数据表:saiku_search_detail
表结构:
CREATE TABLE test.saiku_search_detail ( rpt_date date, from_area_id bigint, from_value_id bigint, in_track_id bigint, gid character varying, current_city_id bigint, dist_city_id bigint, category_name_id bigint, page_id bigint, utmr_page_id bigint, num bigint, id bigint, partner smallint )
条数:8,510,490。大概851万
对一个日期进行查询:
1.1 单个条件
select count(1) from test.saiku_search_detail where rpt_date = ‘2016-05-13‘
结果:1110ms
"Aggregate (cost=160934.85..160934.86 rows=1 width=0)" " -> Seq Scan on saiku_search_detail (cost=0.00..160816.78 rows=47230 width=0)" " Filter: (rpt_date = ‘2016-05-13‘::date)"
1.2 两个条件
select count(1) from test.saiku_search_detail where rpt_date = ‘2016-05-13‘ and from_area_id = 135
结果:1782ms
"Aggregate (cost=184432.32..184432.33 rows=1 width=0)" " -> Seq Scan on saiku_search_detail (cost=0.00..184431.73 rows=236 width=0)" " Filter: ((rpt_date = ‘2016-05-13‘::date) AND (from_area_id = 135))"
没有任何异议,0个索引!
--btree索引 CREATE INDEX saiku_search_detail_from_area_id_idx ON saiku_search_detail USING btree (from_area_id); --hash索引 CREATE INDEX saiku_search_detail_rpt_date_idx ON saiku_search_detail USING hash (rpt_date);
2.1 单个条件
select count(1) from saiku_search_detail where rpt_date = ‘2016-05-13‘
结果:83ms
"Aggregate (cost=8.02..8.03 rows=1 width=0)" " -> Index Scan using saiku_search_detail_rpt_date_idx on saiku_search_detail (cost=0.00..8.02 rows=1 width=0)" " Index Cond: (rpt_date = ‘2016-05-13‘::date)"
使用了索引
2.2 两个条件
select count(1) from saiku_search_detail where rpt_date = ‘2016-05-13‘ and from_area_id = 135
结果:149ms
"Aggregate (cost=8.02..8.03 rows=1 width=0)" " -> Index Scan using saiku_search_detail_rpt_date_idx on saiku_search_detail (cost=0.00..8.02 rows=1 width=0)" " Index Cond: (rpt_date = ‘2016-05-13‘::date)" " Filter: (from_area_id = 135)"
使用了一个索引,第二个索引没有生效。尝试修改sql的条件顺序:
select count(1) from saiku_search_detail where from_area_id = 135 and rpt_date = ‘2016-05-13‘
结果一样!这说明在Postgresql里面,建立两个索引字段,只会一个起作用!
--复合索引,两个字段都添加索引 CREATE INDEX saiku_search_detail_rpt_date_from_area_idx ON test.saiku_search_detail USING btree (rpt_date, from_area_id);
3.1 单个条件查询&建立索引的第一个字段
select count(1) from test.saiku_search_detail where rpt_date = ‘2016-05-13‘
结果:66ms
"Aggregate (cost=47843.00..47843.01 rows=1 width=0)" " -> Bitmap Heap Scan on saiku_search_detail (cost=2220.63..47362.94 rows=192025 width=0)" " Recheck Cond: (rpt_date = ‘2016-05-13‘::date)" " -> Bitmap Index Scan on saiku_search_detail_rpt_date_from_area_idx (cost=0.00..2172.62 rows=192025 width=0)"
可见使用了部分索引
3.2 两个条件查询
select count(1) from test.saiku_search_detail where rpt_date = ‘2016-05-13‘ and from_area_id = 135
结果:65ms
"Aggregate (cost=46124.99..46125.00 rows=1 width=0)" " -> Bitmap Heap Scan on saiku_search_detail (cost=1509.67..45857.37 rows=107047 width=0)" " Recheck Cond: ((rpt_date = ‘2016-05-13‘::date) AND (from_area_id = 135))" " -> Bitmap Index Scan on saiku_search_detail_rpt_date_from_area_idx (cost=0.00..1482.90 rows=107047 width=0)"
使用了索引
标签:
原文地址:http://www.cnblogs.com/liqiu/p/5494967.html