InnoDB体系架构图如下:
一、后台线程简介:
1、Master ThreadMaster Thread 是一个非常核心的后台线程,主要负责将缓冲池中的数据异步刷新到磁盘,保证数据的一致性,包括脏页的刷新、合并插入缓冲(INSERT BUFFER)、回滚页(UNDO PAGE)的回收等。
2、IO Thread在InnoDB存储引擎中大量使用了AIO(Async IO)来处理IO请求,这样可以极大提高数据库的性能。而IO Thread(insert buffer thread、log thread、read thread、write thread)的工作主要是负责这些IO请求的回调(call back)处理。
mysql> show engine innodb status \G;
*************************** 1. row ***************************
Type: InnoDB
Name:
Status:
=====================================
2018-07-20 06:36:22 0x7f5414ab7700 INNODB MONITOR OUTPUT
=====================================
Per second averages calculated from the last 58 seconds
-----------------
BACKGROUND THREAD
-----------------
srv_master_thread loops: 126 srv_active, 0 srv_shutdown, 3465 srv_idle
srv_master_thread log flush and writes: 3591
----------
SEMAPHORES
----------
OS WAIT ARRAY INFO: reservation count 1142
OS WAIT ARRAY INFO: signal count 783
RW-shared spins 0, rounds 833, OS waits 278
RW-excl spins 0, rounds 10438, OS waits 220
RW-sx spins 31, rounds 930, OS waits 28
Spin rounds per wait: 833.00 RW-shared, 10438.00 RW-excl, 30.00 RW-sx
------------------------
LATEST DETECTED DEADLOCK
------------------------
2018-07-20 05:43:02 0x7f541496d700
*** (1) TRANSACTION:
TRANSACTION 87051, ACTIVE 0 sec starting index read
mysql tables in use 1, locked 1
LOCK WAIT 2 lock struct(s), heap size 1136, 1 row lock(s)
MySQL thread id 25, OS thread handle 139998959662848, query id 429537 localhost root updating
UPDATE sbtest set k=k+1 where id=4965
*** (1) WAITING FOR THIS LOCK TO BE GRANTED:
RECORD LOCKS space id 62 page no 100 n bits 144 index PRIMARY of table `andyhsi`.`sbtest` trx id 87051 lock_mode X locks rec but not gap waiting
Record lock, heap no 39 PHYSICAL RECORD: n_fields 6; compact format; info bits 32
0: len 4; hex 00001365; asc e;;
1: len 6; hex 000000015401; asc T ;;
2: len 7; hex 66000002253047; asc f %0G;;
3: len 4; hex 00000001; asc ;;
4: len 30; hex 202020202020202020202020202020202020202020202020202020202020; asc ; (total 120 bytes);
5: len 30; hex 616161616161616161616666666666666666666672727272727272727272; asc aaaaaaaaaaffffffffffrrrrrrrrrr; (total 60 bytes);
*** (2) TRANSACTION:
TRANSACTION 87041, ACTIVE 0 sec inserting
mysql tables in use 1, locked 1
4 lock struct(s), heap size 1136, 4 row lock(s), undo log entries 3
MySQL thread id 23, OS thread handle 139999099410176, query id 429563 localhost root update
INSERT INTO sbtest values(4965,0,‘ ‘,‘aaaaaaaaaaffffffffffrrrrrrrrrreeeeeeeeeeyyyyyyyyyy‘)
*** (2) HOLDS THE LOCK(S):
RECORD LOCKS space id 62 page no 100 n bits 144 index PRIMARY of table `andyhsi`.`sbtest` trx id 87041 lock_mode X locks rec but not gap
Record lock, heap no 39 PHYSICAL RECORD: n_fields 6; compact format; info bits 32
0: len 4; hex 00001365; asc e;;
1: len 6; hex 000000015401; asc T ;;
2: len 7; hex 66000002253047; asc f %0G;;
3: len 4; hex 00000001; asc ;;
4: len 30; hex 202020202020202020202020202020202020202020202020202020202020; asc ; (total 120 bytes);
5: len 30; hex 616161616161616161616666666666666666666672727272727272727272; asc aaaaaaaaaaffffffffffrrrrrrrrrr; (total 60 bytes);
Record lock, heap no 40 PHYSICAL RECORD: n_fields 6; compact format; info bits 0
0: len 4; hex 00001366; asc f;;
1: len 6; hex 0000000153d8; asc S ;;
2: len 7; hex 510000017a0dd4; asc Q z ;;
3: len 4; hex 00000002; asc ;;
4: len 30; hex 3330353431393839362d3330353431393839362d3330353431393839362d; asc 305419896-305419896-305419896-; (total 120 bytes);
5: len 30; hex 616161616161616161616666666666666666666672727272727272727272; asc aaaaaaaaaaffffffffffrrrrrrrrrr; (total 60 bytes);
*** (2) WAITING FOR THIS LOCK TO BE GRANTED:
RECORD LOCKS space id 62 page no 100 n bits 144 index PRIMARY of table `andyhsi`.`sbtest` trx id 87041 lock mode S waiting
Record lock, heap no 39 PHYSICAL RECORD: n_fields 6; compact format; info bits 32
0: len 4; hex 00001365; asc e;;
1: len 6; hex 000000015401; asc T ;;
2: len 7; hex 66000002253047; asc f %0G;;
3: len 4; hex 00000001; asc ;;
4: len 30; hex 202020202020202020202020202020202020202020202020202020202020; asc ; (total 120 bytes);
5: len 30; hex 616161616161616161616666666666666666666672727272727272727272; asc aaaaaaaaaaffffffffffrrrrrrrrrr; (total 60 bytes);
*** WE ROLL BACK TRANSACTION (1)
------------
TRANSACTIONS
------------
Trx id counter 87072
Purge done for trx‘s n:o < 87072 undo n:o < 0 state: running but idle
History list length 1284
LIST OF TRANSACTIONS FOR EACH SESSION:
---TRANSACTION 421474457131744, not started
0 lock struct(s), heap size 1136, 0 row lock(s)
---TRANSACTION 421474457130832, not started
0 lock struct(s), heap size 1136, 0 row lock(s)
--------
FILE I/O
--------
I/O thread 0 state: waiting for completed aio requests (insert buffer thread)
I/O thread 1 state: waiting for completed aio requests (log thread)
I/O thread 2 state: waiting for completed aio requests (read thread)
I/O thread 3 state: waiting for completed aio requests (read thread)
I/O thread 4 state: waiting for completed aio requests (read thread)
I/O thread 5 state: waiting for completed aio requests (read thread)
I/O thread 6 state: waiting for completed aio requests (write thread)
I/O thread 7 state: waiting for completed aio requests (write thread)
I/O thread 8 state: waiting for completed aio requests (write thread)
I/O thread 9 state: waiting for completed aio requests (write thread)
Pending normal aio reads: [0, 0, 0, 0] , aio writes: [0, 0, 0, 0] ,
ibuf aio reads:, log i/o‘s:, sync i/o‘s:
Pending flushes (fsync) log: 0; buffer pool: 0
537 OS file reads, 16854 OS file writes, 14507 OS fsyncs
0.00 reads/s, 0 avg bytes/read, 0.00 writes/s, 0.00 fsyncs/s
-------------------------------------
INSERT BUFFER AND ADAPTIVE HASH INDEX
-------------------------------------
Ibuf: size 1, free list len 0, seg size 2, 0 merges
merged operations:
insert 0, delete mark 0, delete 0
discarded operations:
insert 0, delete mark 0, delete 0
Hash table size 26041, node heap has 1 buffer(s)
Hash table size 26041, node heap has 0 buffer(s)
Hash table size 26041, node heap has 0 buffer(s)
Hash table size 26041, node heap has 0 buffer(s)
Hash table size 26041, node heap has 1 buffer(s)
Hash table size 26041, node heap has 0 buffer(s)
Hash table size 26041, node heap has 4 buffer(s)
Hash table size 26041, node heap has 10 buffer(s)
0.00 hash searches/s, 0.00 non-hash searches/s
---
LOG
---
Log sequence number 354009487
Log flushed up to 354009487
Pages flushed up to 354009487
Last checkpoint at 354009478
0 pending log flushes, 0 pending chkp writes
14125 log i/o‘s done, 0.00 log i/o‘s/second
----------------------
BUFFER POOL AND MEMORY
----------------------
Total large memory allocated 107380736
Dictionary memory allocated 151814
Buffer pool size 6400
Free buffers 5457
Database pages 927
Old database pages 343
Modified db pages 0
Pending reads 0
Pending writes: LRU 0, flush list 0, single page 0
Pages made young 0, not young 0
0.00 youngs/s, 0.00 non-youngs/s
Pages read 504, created 423, written 2492
0.00 reads/s, 0.00 creates/s, 0.00 writes/s
No buffer pool page gets since the last printout
Pages read ahead 0.00/s, evicted without access 0.00/s, Random read ahead 0.00/s
LRU len: 927, unzip_LRU len: 0
I/O sum[0]:cur[0], unzip sum[0]:cur[0]
--------------
ROW OPERATIONS
--------------
0 queries inside InnoDB, 0 queries in queue
0 read views open inside InnoDB
Process ID=8062, Main thread ID=139998976448256, state: sleeping
Number of rows inserted 40000, updated 51849, deleted 20002, read 8599948
0.00 inserts/s, 0.00 updates/s, 0.00 deletes/s, 0.00 reads/s
----------------------------
END OF INNODB MONITOR OUTPUT
============================
1 row in set (0.01 sec)
ERROR:
No query specified
3、Purge Thread:事务被提交后,其所使用的undo log可能不再需要,因此需要Purge Thread来回收已经使用并分配的undo页。
1
2
3
4
5
6
7
|
mysql> show variables like ‘innodb_purge_threads‘;
+----------------------+-------+
| Variable_name | Value |
+----------------------+-------+
| innodb_purge_threads | 1 |
+----------------------+-------+
1 row in set (0.00 sec)
|
4、Page Cleaner ThreadPage Cleaner Thread是在InnoDB 1.2.x版本中引人的。其作用是将之前版本中脏页的刷新操作都放入到单独的线程中来完成。而其目的是为了减轻原Master Thread的工作及对于用户查询线程的阻塞,进一步提高InnoDB存储引擎的性能。
二、InnoDB内存数据对象简介
在数据库系统中,由于CPU速度与磁盘速度之间的鸿沟,基于磁盘的数据库系统通常使用缓冲池技术来提高数据库的整体性能。
缓冲池简单来说就是一块内存区域。
在数据库中进行读取页的操作,首先将从磁盘读到的页存放到缓冲池中,这个过程称为将页"FIX"在缓冲池。
下一次在读到相同的页时,首先判断该页是否在缓冲池中。若在缓冲池中,称该页在缓冲池中被命中,直接读取该页。否则,读取磁盘上的页。
对于数据库中页的修改操作,则首先修改在缓冲池中的页,然后再以一定的频率刷新到磁盘上。页从缓冲池刷新回磁盘的操作并不是在每次页发生更新时触发,而是通过一种称为Checkpoint的机制刷新回磁盘。
1
2
3
4
5
6
7
|
mysql> show variables like ‘innodb_buffer_pool_size‘;
+-------------------------+-----------+
| Variable_name | Value |
+-------------------------+-----------+
| innodb_buffer_pool_size | 134217728 |
+-------------------------+-----------+
1 row in set (0.00 sec)
|
InnoDB内存数据对象:数据页(undo page)、索引页(index page)、插入缓冲(insert buffer)、自适应哈希索引(adaptive hash index)、锁信息(lock info)、数据字典信息(data dictionary)、重做日志缓冲(redo log_buffer)、额外内存池(innodb_additional_mem_pool_size)。
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
|
mysql> show engine innodb status\G;
*************************** 1. row ***************************
Type: InnoDB
Name:
Status:
=====================================
140506 23:37:52 INNODB MONITOR OUTPUT
=====================================
Per second averages calculated from the last 10 seconds
……
Buffer pool size 8191 //缓冲池总大小,8191*16K
Free buffers 7785 //Free列表中的页的数量
Database pages 405 //LRU列表中页的数量
Old database pages 0
Modified db pages 0 //脏页数量
Pending reads 0
Pending writes: LRU 0, flush list 0, single page 0
Pages made young 0, not young 0
0.00 youngs/s, 0.00 non-youngs/s
Pages read 405, created 0, written 0
0.00 reads/s, 0.00 creates/s, 0.00 writes/s
……
----------------------------
END OF INNODB MONITOR OUTPUT
============================
|
参考:《MySQL技术内幕:InnoDB存储引擎》
Innodb体系结构
- Innodb存储引擎主要包括内存池以及后台线程。
- 内存池:多个内存块组成一个内存池,主要维护进程/线程的内部数据、缓存磁盘数据,修改文件前先修改内存、redo log
- 后台线程:刷新内存池中的额
内存
缓冲池
- Innodb的数据以页的形式存储在磁盘,因此采用内存作为缓存页数据。
- 读页数据时,先将磁盘上的页数据“FIX”到缓冲池,下次读即可直接从缓冲池中读。
- 修改数据时,先修改缓冲池中的页数据,然后刷新到磁盘,并不是每次都刷新而是通过Checkpoint机制刷新到磁盘。
- 数据页类型:索引页、数据页、undo页、插入缓冲(insert buffer)、自适应哈希索引、锁信息、数据字典信息等
- 缓存池通过LRU算法管理。
LRU、Free List、Flush List
- 普通LRU:最频繁的处于列表前端,最少使用处于尾端,先释放列表尾端的页。
- Innodb LRU:在LRU队列中加入midpoint位置,默认值5/8,表示新读取的页加入到列表的5/8位置。midpoint之后列表成为old表,之前称为new表。即列表尾端到表尾37%为old表,其余为new表。new表存放活跃数据。
- Free List:数据库启动时LRU表为空,页均存放在Free List中。需要使用时从该表中获取。
- Flush List管理缓存中被修改过的页。
- unzip_LRU,压缩页大小为1、2、4、8KB,其还是属于LRU管理。unzip_LRU对不同大小页分开管理,采用伙伴算法分配内存。
redo log buffer
redo log先都写入该buffer,而后按一定频率刷新到磁盘(1s/次),默认8M。其刷到磁盘主要一下几个情况:
- Master Thread每秒执行一次。
- 事物提交时。
- redo log buffer剩余空间小于1/2。
额外的内存池
对一些数据结构本身的内存分配是从额外内存池分配。
线程
Master Thread
负责将缓存池中的数据异步刷新到磁盘,包括脏页。合并插入缓存(INSERT BUFFER)、UNDO页的回收等。
IO Thread
Innodb中大量使用AIO处理写请求,IO Thread则主要处理这些请求的回调,包括write、read、insert buffer和log IO Thread。
Purge Thread
主要用来回收undo log,Innodb1.1之前由Master Thread负责。
Page Cleaner Thread
清理已提交事物的UNDO log。
Checkpoint
事务型数据库一般采用Write Ahead Log策略,当事物提交时先写redo log而后修改内存中的页。当数据库宕机对于还未写入磁盘的修改数据可以通过redo log恢复。Checkpoint作用在于保证该点之前的所有修改的页均已刷新到磁盘,这之前的redo log在恢复数据时可以不需要了。
Sharp Checkpoint
发生在数据库关闭时,将所有脏页写入磁盘,数据库运行时一般不使用。
Fuzzy Checkpoint
只刷新部分部分脏页。
- Master Thread Checkpoint:Master Thread异步已一定频率刷新一定比例脏页。
- Flush_LRU_LIST Checkpoint:为了保证LRU中有一定数量的空闲页,Page Clear Thread将对LRU中尾端页进行移除,如果存在脏页则做刷新。
- Async/Sync Flush Checkpoint:为了保证redo log循环使用(覆盖),对于需要将redo文件中不可用的脏页进行刷新到磁盘。
- Dirty Page too much Checkpoint:脏页数量太多。
Master Thread工作方式
Innodb 1.2.x之前
主要包括主loop、background loop、flush loop和suspend loop。其中的参数可以配置。
while(true){
Innodb 1.2.x
Master Thread中的脏页刷新功能完全由Page Cleaner Thread执行。
if innodb is idle
执行每10s一次的操作
else
执行每秒执行的操作
Innodb关键特性
插入缓冲
- 当插入数据需要更新非聚集索引时,如果每次都更新则需要进行多次随机IO,因此将这些值写入缓冲对相同页的进行合并提高IO性能。
- 插入非聚集索引时,先判断该索引页是否在缓冲池中,在则直接插入。否则写入到Insert Buffer对象。
- 条件:二级索引,索引不能是unique(因为如果是unique则必须保证唯一性,此时得检查所有索引页,还是随机IO了)
- Change Buffer:包括Insert Buffer、Delete Buffer、Purge Buffer,update操作包括将记录标记为已删除和真正将记录删除两个过程,对应后两个Buffer。
- Insert Buffer内部是一颗B+树
- Merge Insert Buffer三种情况:
- 对应的索引页被读入缓冲池。
- 对应的索引页的可用空间小于1/32,则强制进行合并。
- Master Thread中的合并插入缓冲。
两次写
在对脏页刷新到磁盘时,如果某一页还没写完就宕机,此时该页数据已经混乱无法通过redo实现恢复。innodb提供了doublewrite机制,其刷新脏页步骤如下:
1. 先将脏页数据复制到doublewrite buffer中(2MB内存)
2. 将doublewrite buffer分两次,每次1MB写入到doublewrite磁盘(2MB)中。
3. 马上同步脏页数据到磁盘。对于数据混乱的页则可以从doublewrite中读取到,该页写到共享表空间。
自适应哈希索引
InnoDB存储引擎会监控对表上索引的查找,如果观察到建立哈希索引可以带来速度的提升,则建立哈希索引,所以称之为自适应(adaptive) 的。自适应哈希索引通过缓冲池的B+树构造而来,因此建立的速度很快。而且不需要将整个表都建哈希索引,InnoDB存储引擎会自动根据访问的频率和模式 来为某些页建立哈希索引。
异步IO
linux和windows中提供异步IO,其可以对连续的页做合并连续页的IO操作使随机IO变顺序IO。
刷新邻接页
刷新页时判断相邻页是否也是脏页。