标签:直接 _id 区间 数据 pre 表结构 mysql大数据 esc 包含
mysql大数据量使用limit分页,随着页码的增大,查询效率越低下。
当一个表数据有几百万的数据的时候成了问题!
如 select * from table limit 0,10 这个没有问题 当 limit 200000,10 的时候数据读取就很慢
原因本质:
1)limit语句的查询时间与起始记录(offset)的位置成正比
2)mysql的limit语句是很方便,但是对记录很多的表并不适合直接使用。
例如: limit10000,20的意思扫描满足条件的10020行,扔掉前面的10000行,返回最后的20行,问题就在这里。
? LIMIT 2000000, 30 扫描了200万+ 30行,怪不得慢的都堵死了,甚至会导致磁盘io 100%消耗。 ? but: limit 30 这样的语句仅仅扫描30行。
优化手段:干掉或者利用 limit offset,size 中的offset
不是直接使用limit,而是首先获取到offset的id然后直接使用limit size来获取数据
利用表的覆盖索引来加速分页查询
覆盖索引:
就是select 的数据列只用从索引中就能获得,不必读取数据行。mysql 可以利用索引返回select列表中的字段,而不必根据索引再次读取数据文件,换句话说:查询列要被所创建的索引覆盖
因为利用索引查找有优化算法,且数据就在查询索引上面,不用再去找相关的数据地址了,这样节省了很多时间。另外Mysql中也有相关的索引缓存,在并发高的时候利用缓存就效果更好了。
在我们的例子中,我们知道id字段是主键,自然就包含了默认的主键索引。
这次我们之间查询最后一页的数据(利用覆盖索引,只包含id列),如下:
#覆盖索引只包含id列 的时间显著优于 select * 不言而喻 select * from table where company_id = 1 and mark =0 order by id desc limit 200000 ,20; select id from table where company_id = 1 and mark =0 order by id desc limit 200000 ,20;
那么如果我们也要查询所有列,有两种方法,一种是id>=的形式,另一种就是利用join,看下实际情况:
#两者用的都是一个原理嘛,所以效果也差不多 SELECT * FROM xxx WHERE ID > =(select id from xxx limit 1000000, 1) limit 20; SELECT * FROM xxx a JOIN (select id from xxx limit 1000000, 20) b ON a.ID = b.id;
xxx_dev.table 300万数据
xxx_ys.table 5000万数据
环境差异:2 再1 的基础上创建了多个索引。两边表结构->索引数量不一样,会存再同样查询前20万数据 xxx_ys比 xxx_dev快些(可见创建索引对查询的提升不言而喻,本文不在讲述索引的优点)
#导出符合条件的第 20-40万的数据 #含 offset 查询 ->平均耗时:9.958s 左右 select * from table where company_id = 1 and mark =0 order by id desc limit 200000 ,200000; #分开查询 先查询最大id 在执行 id<=max #平均耗时:7.505s 左右 select id from table where company_id = 1 and mark =0 order by id desc limit 200000 ,1; #平均耗时:9.092s 左右 select * from table where company_id = 1 and mark =0 and id <=12559073 order by id desc limit 200000; #覆盖索引获取max + id<=max -> 平均耗时:17.576s 左右 select * from table where company_id = 1 and mark =0 and id <= (select id from table where company_id = 1 and mark =0 order by id desc limit 200000 ,1) order by id desc limit 200000; #覆盖索引 + join ->平均耗时:11.325s 左右 select p.* from table p join (select id from table where company_id = 1 and mark =0 order by id desc limit 200000 ,200000) a on a.id = p.id;
#----------------------------------------------------------------------------------------------------------------------------------------------
#导出符合条件的第 60-80万的数据 #含 offset 查询 -> 平均耗时:11.307s 左右 select * from table where company_id = 1 and mark =0 order by id desc limit 600000 ,200000; #分开查询 先查询最大id 在执行 id<=max #平均耗时:7.754s 左右 select id from table where company_id = 1 and mark =0 order by id desc limit 600000 ,1; #平均耗时:7.623s 左右 select * from table where company_id = 1 and mark =0 and id <=12159073 order by id desc limit 200000; #覆盖索引获取max + id<=max -> 平均耗时:16.67s 左右 select * from table where company_id = 1 and mark =0 and id <= (select id from table where company_id = 1 and mark =0 order by id desc limit 600000 ,1) order by id desc limit 200000; #覆盖索引 + join ->平均耗时:8.823s 左右 select p.* from table p join (select id from table where company_id = 1 and mark =0 order by id desc limit 600000 ,200000) a on a.id = p.id;
不到百万级别的数据库查询
优化 limit offset,size => limit size 效果不明显,没必要优化
查询导出均可以用 limit offset,size
#导出符合条件的第 160-180万的数据 #含 offset 查询 -> 平均耗时:15.13s 左右 select * from table where company_id = 1 and mark =0 order by id desc limit 1600000 ,200000; #分开查询 先查询最大id 在执行 id<=max #平均耗时:6.977s 左右 select id from table where company_id = 1 and mark =0 order by id desc limit 1600000 ,1; #平均耗时:5.453s select * from table where company_id = 1 and mark =0 and id <=11159073 order by id desc limit 200000; #覆盖索引获取max + id<=max -> 平均耗时:17.049s 左右 select * from table where company_id = 1 and mark =0 and id <= (select id from table where company_id = 1 and mark =0 order by id desc limit 1600000 ,1) order by id desc limit 200000; #覆盖索引 + join ->平均耗时:9.618s 左右 select p.* from table p join (select id from table where company_id = 1 and mark =0 order by id desc limit 1600000 ,200000) a on a.id = p.id;
#----------------------------------------------------------------------------------------------------------------------------------------------
#导出符合条件的第 260-280万的数据
#含 offset 查询 -> 平均耗时:20.864s 左右
select * from table where company_id = 1 and mark =0 order by id desc limit 2600000 ,200000;
#分开查询 先查询最大id 在执行 id<=max #平均耗时:7.213s 左右
select id from table where company_id = 1 and mark =0 order by id desc limit 2600000 ,1;
#平均耗时:1.691s 左右
select * from table where company_id = 1 and mark =0 and id <=10158757 order by id desc limit 200000;
#覆盖索引获取max + id<=max -> 平均耗时:8.748s 左右
select * from table where company_id = 1 and mark =0 and id <= (select id from table where company_id = 1 and mark =0 order by id desc limit 2600000 ,1) order by id desc limit 200000; #覆盖索引 + join ->平均耗时:9.11s 左右 select p.* from table p join (select id from table where company_id = 1 and mark =0 order by id desc limit 2600000 ,200000) a on a.id = p.id;
百万级别的数据库查询
优化 limit offset,size => limit size 效果明显
查询导出均可以用 limit size
其中覆盖索引获取起始id :select id from table where xxx limit 2600000 ,1; 的耗时会随着offset 的增加而增加。此种方式在查询前200万左右的数据时基本能在10s左右搞定,但是要查询 500万-600万这区间数据耗时极其显著。
ps:覆盖索引 + join 方式。三四百万左右的数据量该种方式是值得采用的。
#导出符合条件的第 20-40万的数据 #含 offset 查询 ->平均耗时:2.663s 左右 select * from table where company_id = 1 and mark =0 order by id desc limit 200000 ,200000; #分开查询 先查询最大id 在执行 id<=max #平均耗时:0.128s 左右 select id from table where company_id = 1 and mark =0 order by id desc limit 200000 ,1; #平均耗时:1.693s 左右 select * from table where company_id = 1 and mark =0 and id <=82878478 order by id desc limit 200000; #覆盖索引获取max + id<=max -> 平均耗时:1.922s 左右 select * from table where company_id = 1 and mark =0 and id <= (select id from table where company_id = 1 and mark =0 order by id desc limit 200000 ,1) order by id desc limit 200000; #覆盖索引 + join ->平均耗时:4.628s 左右 select p.* from table p join (select id from table where company_id = 1 and mark =0 order by id desc limit 200000 ,200000) a on a.id = p.id;
#----------------------------------------------------------------------------------------------------------------------------------------------
#导出符合条件的第 60-80万的数据 #含 offset 查询 ->平均耗时:3.364s 左右 select * from table where company_id = 1 and mark =0 order by id desc limit 200000 ,200000; #分开查询 先查询最大id 在执行 id<=max #平均耗时:0.377s 左右 select id from table where company_id = 1 and mark =0 order by id desc limit 200000 ,1; #平均耗时:1.665s 左右 select * from table where company_id = 1 and mark =0 and id <=82284594 order by id desc limit 200000; #覆盖索引获取max + id<=max -> 平均耗时:2.02s 左右 select * from table where company_id = 1 and mark =0 and id <= (select id from table where company_id = 1 and mark =0 order by id desc limit 200000 ,1) order by id desc limit 200000; #覆盖索引 + join ->平均耗时:2.648s 左右 select p.* from table p join (select id from table where company_id = 1 and mark =0 order by id desc limit 200000 ,200000) a on a.id = p.id;
#仅仅查询id #limt 100万,1 耗时 0.671s select id from table where company_id = 1 and mark =0 order by id desc limit 1000000 ,1; #limt 200万,1 耗时 600.948s select id from table where company_id = 1 and mark =0 order by id desc limit 2000000 ,1; #limit 300万+ 不在考虑 已超过650+s 极力不推荐 select id from table where company_id = 1 and mark =0 order by id desc limit 3000000 ,1;
#方式1 仅仅使用 limit size; #每次查询前获取上一页最小id作为下一页的最大id使用 82878478 82543981 82284594 82043968 81822598 (100) 81596439 81361098 81136212 80906192...... #首页查询 select * from table where company_id = 1 and mark =0 order by id desc limit 200000; select id from table where company_id = 1 and mark =0 order by id desc limit 200000,1; #-------------------------------------------------------------------------------- #非首页查询 #查询当前页最小id(也即次页最大id) 平均耗时:0.15s select id from table where company_id = 1 and mark =0 and id <=82543981 order by id desc limit 200000,1; #平均耗时:1.539s select * from table where company_id = 1 and mark =0 and id <=82543981 order by id desc limit 200000; ? #方式2 使用 min<=id<=max #每次查询前获取上一页最小id 82878478 82543981 82284594 82043968 81822598 (100) 81596439 81361098 81136212 80906192...... #首页 select * from table where company_id = 1 and mark =0 order by id desc limit 200000; #查询当前页最小id(也即次页最大id) select id from table where company_id = 1 and mark =0 order by id desc limit 200000,1; #-------------------------------------------------------------------------------- #非首页 #查询当前页最小id(也即次页最大id) 平均耗时:0.17s select * from table where company_id = 1 and mark =0 and id <=82878478 order by id desc limit 200000,1; # 平均耗时:1.66s select * from table where company_id = 1 and mark =0 and id>=82543981 and id <=82878478 order by id desc;
千万级别的数据库查询
优化 limit offset,size => limit size (在使用或者利用offset是)再获取前200万左右的数据 不明显,倒是到 百万以后的数据查询 无论limit offset,size or limit size 均耗时严重
可以使用不考虑offset的情况下进行优化(方式1、2)
优点:查询导出均不在受offset的影响,查询任意 N 至 N + size 的数据耗时几乎一致(1.8 + 0.2)
缺点:导出可用,查询时候受限->不可任意跳页,不可点击上一页。可以依次点查询下一页
ps:方式1,2 适用于导出 但是不适用于查询
标签:直接 _id 区间 数据 pre 表结构 mysql大数据 esc 包含
原文地址:https://www.cnblogs.com/weixiaotao/p/10646666.html