{{ it.name }}
{{ it.text }}
一、背景
生产环境有一套 MySQL 集群,架构为一主两从,其中一个从库设置了延迟复制,延迟时间为 1 天。
某天在巡检实例时,发现这个延迟从库延迟时间已经超过 1 天,且延迟不停的在增加,在监控上查看数据库状态是正常的,其他两台实例也没有出现问题。
登录数据库 show slave status 查看状态发现 IO 线程和 SQL 线程的状态都是 YES,但实际上 SQL 线程已经出现了报错,信息如下(部分信息已省略):
Slave_IO_Running: Yes Slave_SQL_Running: Yes Last_Errno: 1205 Last_Error: Coordinator stopped because there were error(s) in the worker(s). The most recent failure being: Worker 11 failed executing transaction 'ANONYMOUS' at master log binlog.000338, end_log_pos 40433204. See error log and/or performance_schema.replication_applier_status_by_worker table for more details about this failure or others, if any. Seconds_Behind_Master: 451836 Last_IO_Errno: 0 Last_IO_Error: Last_SQL_Errno: 1205 Last_SQL_Error: Coordinator stopped because there were error(s) in the worker(s). The most recent failure being: Worker 11 failed executing transaction 'ANONYMOUS' at master log binlog.000338, end_log_pos 40433204. See error log and/or performance_schema.replication_applier_status_by_worker table for more details about this failure or others, if any. SQL_Delay: 86400 SQL_Remaining_Delay: NULL Slave_SQL_Running_State: Waiting for workers to exit
这时候如果执行 stop slave,这个会话会卡住,kill 对应的线程也无法停止,只能通过 kill -9 MySQL server 来停止.
实例相关参数配置信息如下,数据库版本为 5.7.31。
slave_parallel_type = LOGICAL_CLOCK slave_parallel_workers = 16 slave_preserve_commit_order = on binlog_transaction_dependency_tracking = WRITESET transaction_isolation = READ-COMMITTED
show full processlist 的结果如下:
mysql> show full processlist; +----------+-------------+-----------------+------+---------+--------+---------------------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ | Id | User | Host | db | Command | Time | State | Info | +----------+-------------+-----------------+------+---------+--------+---------------------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ | 75846950 | system user | | NULL | Connect | 3193 | Waiting for workers to exit | NULL | | 75846951 | system user | | NULL | Connect | 29352 | Waiting for preceding transaction to commit | NULL | | 75846952 | system user | | NULL | Connect | 29352 | Waiting for preceding transaction to commit | NULL | | 75846953 | system user | | NULL | Connect | 29352 | Waiting for preceding transaction to commit | NULL | | 75846954 | system user | | NULL | Connect | 29352 | Waiting for preceding transaction to commit | NULL | | 75846955 | system user | | NULL | Connect | 29352 | Waiting for preceding transaction to commit | NULL | | 75846956 | system user | | NULL | Connect | 29352 | Waiting for preceding transaction to commit | NULL | | 75846957 | system user | | NULL | Connect | 29352 | Waiting for preceding transaction to commit | NULL | | 75846962 | system user | | NULL | Connect | 29352 | Waiting for preceding transaction to commit | NULL | | 75846963 | system user | | NULL | Connect | 29352 | Waiting for preceding transaction to commit | NULL | | 75846964 | system user | | NULL | Connect | 29352 | Waiting for preceding transaction to commit | NULL | | 75846965 | system user | | NULL | Connect | 29352 | Waiting for preceding transaction to commit | NULL | | 75846966 | system user | | NULL | Connect | 29352 | Waiting for preceding transaction to commit | NULL | | 75847163 | system user | | NULL | Connect | 337164 | Waiting for master to send event | NULL | | 75859848 | admin | localhost | NULL | Query | 0 | starting | show full processlist | +----------+-------------+-----------------+------+---------+--------+---------------------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ 27 rows in set (0.00 sec)
根据报错的信息查看 replication_applier_status_by_worker 表的内容(部分信息已省略):
mysql> select * from performance_schema.replication_applier_status_by_worker\G *************************** 8. row *************************** CHANNEL_NAME: WORKER_ID: 8 THREAD_ID: NULL SERVICE_STATE: OFF LAST_SEEN_TRANSACTION: c50d7f49-ef3f-11ea-949c-287b09c1cb73:17860074 LAST_ERROR_NUMBER: 1205 LAST_ERROR_MESSAGE: Worker 8 failed executing transaction 'c50d7f49-ef3f-11ea-949c-287b09c1cb73:17860074' at master log mysql-bin.000379, end_log_pos 259024396; Lock wait timeout exceeded; try restarting transaction LAST_ERROR_TIMESTAMP: 2020-11-27 14:02:46 *************************** 11. row *************************** CHANNEL_NAME: WORKER_ID: 11 THREAD_ID: NULL SERVICE_STATE: OFF LAST_SEEN_TRANSACTION: c50d7f49-ef3f-11ea-949c-287b09c1cb73:17860077 LAST_ERROR_NUMBER: 1205 LAST_ERROR_MESSAGE: Worker 11 failed executing transaction 'c50d7f49-ef3f-11ea-949c-287b09c1cb73:17860077' at master log mysql-bin.000379, end_log_pos 259028754; Lock wait timeout exceeded; try restarting transaction LAST_ERROR_TIMESTAMP: 2020-11-27 14:02:46 *************************** 12. row *************************** CHANNEL_NAME: WORKER_ID: 12 THREAD_ID: NULL SERVICE_STATE: OFF LAST_SEEN_TRANSACTION: c50d7f49-ef3f-11ea-949c-287b09c1cb73:17860078 LAST_ERROR_NUMBER: 1205 LAST_ERROR_MESSAGE: Worker 12 failed executing transaction 'c50d7f49-ef3f-11ea-949c-287b09c1cb73:17860078' at master log mysql-bin.000379, end_log_pos 259030326; Lock wait timeout exceeded; try restarting transaction LAST_ERROR_TIMESTAMP: 2020-11-27 14:01:55
查看报错的 gtid 对应的语句,三条均为 REPLACE INTO tbl_name(col_name, ...) values(...) 这样的语句,其中有两条数据比较接近,但并不是同一行。
在 Google 上查询相关的情况,发现这个问题很大可能跟从库并行回放发生死锁有关,而且还有可能涉及到 MySQL 的 BUG。
在测试环境创建与生产上相似的表,插入一些数据用作测试,来观察下报错语句的加锁情况,表结构如下:
mysql> show create table test; +-------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ | Table | Create Table | +-------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ | test | CREATE TABLE `test` ( `id` int(11) NOT NULL, `a` varchar(10) NOT NULL, `b` varchar(10) NOT NULL, `c` varchar(10) DEFAULT NULL, `d` varchar(10) DEFAULT NULL, PRIMARY KEY (`id`,`a`,`b`), UNIQUE KEY `u_k` (`a`,`b`,`id`), KEY `i_c` (`c`) ) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 | +-------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ 1 row in set (0.00 sec) mysql> select * from test; +----+-----+-----+------+------+ | id | a | b | c | d | +----+-----+-----+------+------+ | 10 | a10 | b10 | c10 | d10 | | 20 | a20 | b20 | c20 | d20 | | 30 | a30 | b30 | c30 | d30 | | 40 | a40 | b40 | c40 | d40 | | 50 | a50 | b50 | c50 | d50 | +----+-----+-----+------+------+ 5 rows in set (0.00 sec)
表上有一个联合主键,一个联合唯一索引和一个普通索引。
执行语句:
session 1 begin; replace into test values(30,'a30','b30','c30','d30');
由于这个语句触发了唯一键冲突,在这个情况下会被拆分成 delete + insert 语句:
delete from test where id=30 and a='a30' and b='b30' and c='c30' and d='d30'; insert into test values(30,'a30','b30','c30','d30');
show engine innodb status 查看目前事务的状态:
---TRANSACTION 5568158, ACTIVE 10 sec 4 lock struct(s), heap size 1136, 4 row lock(s), undo log entries 2 MySQL thread id 91634, OS thread handle 139866244536064, query id 38506448 localhost root TABLE LOCK table `kk`.`test` trx id 5568158 lock mode IX RECORD LOCKS space id 33 page no 3 n bits 80 index PRIMARY of table `kk`.`test` trx id 5568158 lock_mode X locks rec but not gap Record lock, heap no 4 PHYSICAL RECORD: n_fields 7; compact format; info bits 0 0: len 4; hex 8000001e; asc ;; 1: len 3; hex 613330; asc a30;; 2: len 3; hex 623330; asc b30;; 3: len 6; hex 00000054f69e; asc T ;; 4: len 7; hex 5a000000510149; asc Z Q I;; 5: len 3; hex 633330; asc c30;; 6: len 3; hex 643330; asc d30;; RECORD LOCKS space id 33 page no 4 n bits 72 index u_k of table `kk`.`test` trx id 5568158 lock_mode X locks rec but not gap Record lock, heap no 4 PHYSICAL RECORD: n_fields 3; compact format; info bits 0 0: len 3; hex 613330; asc a30;; 1: len 3; hex 623330; asc b30;; 2: len 4; hex 8000001e; asc ;; RECORD LOCKS space id 33 page no 4 n bits 72 index u_k of table `kk`.`test` trx id 5568158 lock_mode X Record lock, heap no 4 PHYSICAL RECORD: n_fields 3; compact format; info bits 0 0: len 3; hex 613330; asc a30;; 1: len 3; hex 623330; asc b30;; 2: len 4; hex 8000001e; asc ;; Record lock, heap no 5 PHYSICAL RECORD: n_fields 3; compact format; info bits 0 0: len 3; hex 613430; asc a40;; 1: len 3; hex 623430; asc b40;; 2: len 4; hex 80000028; asc (;;
以下为这个语句加的锁:
表上的 IX Lock
主键上数据 id=30,a='a30',b='b30' 的 X Lock
唯一索引上数据 a='a30',b='b30',id=30 的 Next-key Lock
唯一索引上数据 a='a40',b='b40',id=40 的 Next-key Lock
从上面的加锁情况来看,这个语句涉及了不少的锁,推测可能是由于这个加锁的范围间接导致了死锁的发生。
接下来继续测试在这一行附近执行 replace into 语句是否会被堵塞。
我们下面测试四种情况,分别是:
replace into test values(20,'a20','b20','c20','d20');
replace into test values(25,'a25','b25','c25','d25');
replace into test values(35,'a35','b35','c35','d35');
replace into test values(40,'a40','b40','c40','d40');
分别对应前一行、与前一行之间的间隙、与后一行之间的间隙和后一行。
replace into 在没有唯一键冲突的情况下等于 insert into 语句
情况一:
被 session 1 的语句堵塞,等待获取唯一索引上数据 a='a30',b='b30',id=30 的 Next-key Lock
---TRANSACTION 5568170, ACTIVE 3 sec inserting mysql tables in use 1, locked 1 LOCK WAIT 5 lock struct(s), heap size 1136, 4 row lock(s), undo log entries 2 MySQL thread id 91635, OS thread handle 139866245347072, query id 38667777 localhost root update replace into test values(20,'a20','b20','c20','d20') ------- TRX HAS BEEN WAITING 3 SEC FOR THIS LOCK TO BE GRANTED: RECORD LOCKS space id 33 page no 4 n bits 72 index u_k of table `kk`.`test` trx id 5568170 lock_mode X waiting Record lock, heap no 4 PHYSICAL RECORD: n_fields 3; compact format; info bits 0 0: len 3; hex 613330; asc a30;; 1: len 3; hex 623330; asc b30;; 2: len 4; hex 8000001e; asc ;;
情况二:
被 session 1 的语句堵塞,等待获取唯一索引上数据 a='a30',b='b30',id=30 前面间隙的 Insert Intention Lock
---TRANSACTION 5568177, ACTIVE 3 sec inserting mysql tables in use 1, locked 1 LOCK WAIT 2 lock struct(s), heap size 1136, 1 row lock(s), undo log entries 1 MySQL thread id 91635, OS thread handle 139866245347072, query id 38672698 localhost root update replace into test values(25,'a25','b25','c25','d25') ------- TRX HAS BEEN WAITING 3 SEC FOR THIS LOCK TO BE GRANTED: RECORD LOCKS space id 33 page no 4 n bits 72 index u_k of table `kk`.`test` trx id 5568177 lock_mode X locks gap before rec insert intention waiting Record lock, heap no 4 PHYSICAL RECORD: n_fields 3; compact format; info bits 0 0: len 3; hex 613330; asc a30;; 1: len 3; hex 623330; asc b30;; 2: len 4; hex 8000001e; asc ;;
情况三:
被 session 1 的语句堵塞,等待获取唯一索引上数据 a='a40',b='b40',id=40 前面间隙的 Insert Intention Lock
---TRANSACTION 5568178, ACTIVE 2 sec inserting mysql tables in use 1, locked 1 LOCK WAIT 2 lock struct(s), heap size 1136, 1 row lock(s), undo log entries 1 MySQL thread id 91635, OS thread handle 139866245347072, query id 38680995 localhost root update replace into test values(35,'a35','b35','c35','d35') ------- TRX HAS BEEN WAITING 2 SEC FOR THIS LOCK TO BE GRANTED: RECORD LOCKS space id 33 page no 4 n bits 72 index u_k of table `kk`.`test` trx id 5568178 lock_mode X locks gap before rec insert intention waiting Record lock, heap no 5 PHYSICAL RECORD: n_fields 3; compact format; info bits 0 0: len 3; hex 613430; asc a40;; 1: len 3; hex 623430; asc b40;; 2: len 4; hex 80000028; asc (;;
情况四:
被 session 1 的语句堵塞,等待获取唯一索引上数据 a='a40',b='b40',id=40 的 X Lock
---TRANSACTION 5568180, ACTIVE 2 sec updating or deleting mysql tables in use 1, locked 1 LOCK WAIT 3 lock struct(s), heap size 1136, 2 row lock(s), undo log entries 1 MySQL thread id 91635, OS thread handle 139866245347072, query id 38682146 localhost root update replace into test values(40,'a40','b40','c40','d40') ------- TRX HAS BEEN WAITING 2 SEC FOR THIS LOCK TO BE GRANTED: RECORD LOCKS space id 33 page no 4 n bits 72 index u_k of table `kk`.`test` trx id 5568180 lock_mode X locks rec but not gap waiting Record lock, heap no 5 PHYSICAL RECORD: n_fields 3; compact format; info bits 0 0: len 3; hex 613430; asc a40;; 1: len 3; hex 623430; asc b40;; 2: len 4; hex 80000028; asc (;;
从上面的结果来看,replace into 在遇到唯一键冲突的情况下确实会堵塞相邻行的 replace into 操作,那么这个问题是如何导致从库死锁的呢?
这时候就可以结合从库的并行回放机制来讨论,当前主库的参数设置是 binlog_transaction_dependency_tracking = WRITESET,在这个设置下,当事务更新的行没有冲突的情况下,是可以并行回放的,即上面的语句都可以并行回放。
为了保证从库上执行事务的顺序与主库一致,从库设置了 slave_preserve_commit_order = on,在这种情况下,并行回放上面的语句就有可能会出现死锁。
假设主库并发执行了下面的语句:
session 1 replace into test values(30,'a30','b30','c30','d30'); # last_committed=1 sequence_number=2 --- session 2 replace into test values(40,'a40','b40','c40','d40'); # last_committed=1 sequence_number=3 --- session 3 repalce into ......
从库回放时先回放了 session 2 的事务,由于这个事务的 sequence_number 比 session 1 大,为了保证提交的顺序,需要等 session 1 的事务先commit ,但经过上面的测试可以知道,session 2 的语句会持有 id = 40 这一行的 Next-key Lock;而 session 1 的语句也需要获取这个锁,所以在这个情况下就导致了死锁。
那么对于 MySQL 来说,发生这种情况,不会自动触发 InnoDB 的死锁检测来回滚事务吗?
实际上确实会回滚事务,但是因为 MySQL 在这个地方有 BUG,在这个调度上可能会导致 woker 线程丢失信号,导致整个复制 hang 住。
BUG 链接:
https://bugs.mysql.com/bug.php?id=87796
https://bugs.mysql.com/bug.php?id=89247
https://bugs.mysql.com/bug.php?id=95249
https://bugs.mysql.com/bug.php?id=99440
1. 为什么在这个一主两从中,只有这个延迟从库发生了死锁的问题,另外一个正常复制的从库没有死锁呢?
这个问题可以从从库的回放并发程度来解答,在没有死锁的情况下,正常复制的从库的并发度只有 1-3 左右,而延迟复制从库的并发一直都能跑满 16 线程。
这个是因为对于正常复制的从库来说,每收到一条主库传过来的事务就执行一条,且都能在短时间内提交,没有事务的积压,所以正常情况下并发度并不高;而延迟复制从库,因为设置了 SQL_DELAY = 86400,所以会有事务的积压,等 SQL 线程检查 relaylog 里未执行的事务时间超过 86400 秒之后,就会开始回放,由于主库上同时执行的事务超过 16 个,所以延迟复制从库的并行回放能跑满 16 线程,这时候同时执行 replace into 的概率就大大增加了,从而增加了死锁的可能。
实际上,后面这个正常复制的从库也出现了死锁的情况。原因是主库 delete 大量数据,造成主从延迟,这样的话回放的事务就会堆积,后面从库在追赶主库的 binlog 时也会出现并行回放跑满 16 线程的情况,导致了并发执行 replace into。这个现象也证明上面的猜想是正确的。
2. 当前有什么办法可以规避、解决这个问题?
如果是类似上面集群的情况,可以尝试调低 slave_parallel_workers,前提是调低参数后不会导致主从复制延迟。
如果不要求从库执行事务顺序与主库一致,可以设置 slave_preserve_commit_order = off 来避免死锁的出现。
对于业务方来说,可以考虑将 replace into 语句改成 select + insert/update,判断数据是否存在后再考虑 insert 或者 update,这样就可以避免 Next-key Lock的出现。
官方已经回复这个 BUG 会修复在 8.0.23,但 5.7 的修复时间暂未给出。
参考:
https://dev.mysql.com/doc/refman/5.7/en/replace.html
https://zhuanlan.zhihu.com/p/196769001
http://mysql.taobao.org/monthly/2015/03/01/