码迷,mamicode.com
首页 > 其他好文 > 详细

MHA在线切换过程

时间:2016-08-04 16:16:58      阅读:1253      评论:0      收藏:0      [点我收藏+]

标签:

  MHA 在线切换是MHA除了自动监控切换换提供的另外一种方式,多用于诸如硬件升级,MySQL数据库迁移等等。该方式提供快速切换和优雅的阻塞写入,无关关闭原有服务器,整个切换过程在0.5-2s 的时间左右,大大减少了停机时间。Online master switch开始只有当所有下列条件得到满足:

 1. IO threads on all slaves are running   // 在所有slave上IO线程运行。
 2. SQL threads on all slaves are running  //SQL线程在所有的slave上正常运行。
 3. Seconds_Behind_Master on all slaves are less or equal than --running_updates_limit seconds  // 在所有的slaves上 Seconds_Behind_Master 要小于等于  running_updates_limit seconds
 4. On master, none of update queries take more than --running_updates_limit seconds in the show processlist output  // 在主上,没有更新查询操作多于running_updates_limit seconds 在show processlist输出结果上。

这些限制的原因是出于安全原因,并尽快切换到新主库。

1.校验当前是否启用masterha_manager(建议停掉)

[root@DBproxy app2]# masterha_check_status --conf=/data/masterha/app1/app1.cnf
app1 (pid:6769) is running(0:PING_OK), master:192.168.0.50
[root@DBproxy app2]#

2.校验slave的IO_threads、SQL_threads、Seconds_Behind_Master

[mysql@MyDB02 masterha]$ mysql -uroot -p123456 -h192.168.0.60 -e ‘show slave status \G‘|grep -E "Slave_IO_Running|Slave_SQL_Running|Seconds_Behind_Master"
Warning: Using a password on the command line interface can be insecure.
             Slave_IO_Running: Yes
            Slave_SQL_Running: Yes
        Seconds_Behind_Master: 0
      Slave_SQL_Running_State: Slave has read all relay log; waiting for the slave I/O thread to update it
[mysql@MyDB02 masterha]$

3.实施在线切换

[root@DBproxy masterha]# masterha_master_switch --conf=/data/masterha/app1/app1.cnf --master_state=alive --new_master_host=192.168.0.60 --orig_master_is_new_slave --running_updates_limit=10000 --interactive=0
Sat Jul 16 09:11:00 2016 - [info] MHA::MasterRotate version 0.56.
Sat Jul 16 09:11:00 2016 - [info] Starting online master switch..
Sat Jul 16 09:11:00 2016 - [info] 
Sat Jul 16 09:11:00 2016 - [info] * Phase 1: Configuration Check Phase..
Sat Jul 16 09:11:00 2016 - [info] 
Sat Jul 16 09:11:00 2016 - [warning] Global configuration file /etc/masterha_default.cnf not found. Skipping.
Sat Jul 16 09:11:00 2016 - [info] Reading application default configuration from /data/masterha/app1/app1.cnf..
Sat Jul 16 09:11:00 2016 - [info] Reading server configuration from /data/masterha/app1/app1.cnf..
Sat Jul 16 09:11:00 2016 - [info] GTID failover mode = 0
Sat Jul 16 09:11:00 2016 - [info] Current Alive Master: 192.168.0.50(192.168.0.50:3306)
Sat Jul 16 09:11:00 2016 - [info] Alive Slaves:
Sat Jul 16 09:11:00 2016 - [info]   192.168.0.60(192.168.0.60:3306)  Version=5.6.29-log (oldest major version between slaves) log-bin:enabled
Sat Jul 16 09:11:00 2016 - [info]     Replicating from 192.168.0.50(192.168.0.50:3306)
Sat Jul 16 09:11:00 2016 - [info]     Primary candidate for the new Master (candidate_master is set)
Sat Jul 16 09:11:00 2016 - [info] Executing FLUSH NO_WRITE_TO_BINLOG TABLES. This may take long time..
Sat Jul 16 09:11:00 2016 - [info]  ok.
Sat Jul 16 09:11:00 2016 - [info] Checking MHA is not monitoring or doing failover..
Sat Jul 16 09:11:00 2016 - [error][/usr/share/perl5/vendor_perl/MHA/MasterRotate.pm, ln142] Getting advisory lock failed on the current master. MHA Monitor runs on the current master. Stop MHA Manager/Monitor and try again.
Sat Jul 16 09:11:00 2016 - [error][/usr/share/perl5/vendor_perl/MHA/ManagerUtil.pm, ln177] Got ERROR:  at /usr/bin/masterha_master_switch line 53
[root@DBproxy masterha]#

将MHA停掉再进行测试
[root@DBproxy masterha]# masterha_stop  --conf=/data/masterha/app1/app1.cnf
Stopped app1 successfully.
[2]-  Exit 1                  nohup masterha_manager --conf=/data/masterha/app1/app1.cnf 2>&1  (wd: /data/masterha/app2)
(wd now: /data/masterha)
[root@DBproxy masterha]#

4.再次实施在线切换

[root@DBproxy masterha]# masterha_master_switch --conf=/data/masterha/app1/app1.cnf --master_state=alive --new_master_host=192.168.0.60 --orig_master_is_new_slave --running_updates_limit=10000 --interactive=0
Sat Jul 16 09:15:03 2016 - [info] MHA::MasterRotate version 0.56.
Sat Jul 16 09:15:03 2016 - [info] Starting online master switch..
Sat Jul 16 09:15:03 2016 - [info] 
Sat Jul 16 09:15:03 2016 - [info] * Phase 1: Configuration Check Phase..
Sat Jul 16 09:15:03 2016 - [info] 
Sat Jul 16 09:15:03 2016 - [warning] Global configuration file /etc/masterha_default.cnf not found. Skipping.
Sat Jul 16 09:15:03 2016 - [info] Reading application default configuration from /data/masterha/app1/app1.cnf..
Sat Jul 16 09:15:03 2016 - [info] Reading server configuration from /data/masterha/app1/app1.cnf..
Sat Jul 16 09:15:03 2016 - [info] GTID failover mode = 0
Sat Jul 16 09:15:03 2016 - [info] Current Alive Master: 192.168.0.50(192.168.0.50:3306)
Sat Jul 16 09:15:03 2016 - [info] Alive Slaves:
Sat Jul 16 09:15:03 2016 - [info]   192.168.0.60(192.168.0.60:3306)  Version=5.6.29-log (oldest major version between slaves) log-bin:enabled
Sat Jul 16 09:15:03 2016 - [info]     Replicating from 192.168.0.50(192.168.0.50:3306)
Sat Jul 16 09:15:03 2016 - [info]     Primary candidate for the new Master (candidate_master is set)
Sat Jul 16 09:15:03 2016 - [info] Executing FLUSH NO_WRITE_TO_BINLOG TABLES. This may take long time..
Sat Jul 16 09:15:03 2016 - [info]  ok.
Sat Jul 16 09:15:03 2016 - [info] Checking MHA is not monitoring or doing failover..
Sat Jul 16 09:15:03 2016 - [info] Checking replication health on 192.168.0.60..
Sat Jul 16 09:15:03 2016 - [info]  ok.
Sat Jul 16 09:15:03 2016 - [info] 192.168.0.60 can be new master.
Sat Jul 16 09:15:03 2016 - [info] 
From:
192.168.0.50(192.168.0.50:3306) (current master)
 +--192.168.0.60(192.168.0.60:3306)

To:
192.168.0.60(192.168.0.60:3306) (new master)
 +--192.168.0.50(192.168.0.50:3306)
Sat Jul 16 09:15:03 2016 - [info] Checking whether 192.168.0.60(192.168.0.60:3306) is ok for the new master..
Sat Jul 16 09:15:03 2016 - [info]  ok.
Sat Jul 16 09:15:03 2016 - [info] 192.168.0.50(192.168.0.50:3306): SHOW SLAVE STATUS returned empty result. To check replication filtering rules, temporarily executing CHANGE MASTER to a dummy host.
Sat Jul 16 09:15:03 2016 - [info] 192.168.0.50(192.168.0.50:3306): Resetting slave pointing to the dummy host.
Sat Jul 16 09:15:03 2016 - [info] ** Phase 1: Configuration Check Phase completed.
Sat Jul 16 09:15:03 2016 - [info] 
Sat Jul 16 09:15:03 2016 - [info] * Phase 2: Rejecting updates Phase..
Sat Jul 16 09:15:03 2016 - [info] 
Sat Jul 16 09:15:03 2016 - [warning] master_ip_online_change_script is not defined. Skipping disabling writes on the current master.
Sat Jul 16 09:15:03 2016 - [info] Locking all tables on the orig master to reject updates from everybody (including root):
Sat Jul 16 09:15:03 2016 - [info] Executing FLUSH TABLES WITH READ LOCK..
Sat Jul 16 09:15:03 2016 - [info]  ok.
Sat Jul 16 09:15:03 2016 - [info] Orig master binlog:pos is mysql-bin.000009:40355591.
Sat Jul 16 09:15:03 2016 - [info]  Waiting to execute all relay logs on 192.168.0.60(192.168.0.60:3306)..
Sat Jul 16 09:15:03 2016 - [info]  master_pos_wait(mysql-bin.000009:40355591) completed on 192.168.0.60(192.168.0.60:3306). Executed 0 events.
Sat Jul 16 09:15:03 2016 - [info]   done.
Sat Jul 16 09:15:03 2016 - [info] Getting new master‘s binlog name and position..
Sat Jul 16 09:15:03 2016 - [info]  mysql-bin.000006:120
Sat Jul 16 09:15:03 2016 - [info]  All other slaves should start replication from here. Statement should be: CHANGE MASTER TO MASTER_HOST=‘192.168.0.60‘, MASTER_PORT=3306, MASTER_LOG_FILE=‘mysql-bin.000006‘, MASTER_LOG_POS=120, MASTER_USER=‘repl‘, MASTER_PASSWORD=‘xxx‘;
Sat Jul 16 09:15:03 2016 - [info] 
Sat Jul 16 09:15:03 2016 - [info] * Switching slaves in parallel..
Sat Jul 16 09:15:03 2016 - [info] 
Sat Jul 16 09:15:03 2016 - [info] Unlocking all tables on the orig master:
Sat Jul 16 09:15:03 2016 - [info] Executing UNLOCK TABLES..
Sat Jul 16 09:15:03 2016 - [info]  ok.
Sat Jul 16 09:15:03 2016 - [info] Starting orig master as a new slave..
Sat Jul 16 09:15:03 2016 - [info]  Resetting slave 192.168.0.50(192.168.0.50:3306) and starting replication from the new master 192.168.0.60(192.168.0.60:3306)..
Sat Jul 16 09:15:03 2016 - [info]  Executed CHANGE MASTER.
Sat Jul 16 09:15:14 2016 - [error][/usr/share/perl5/vendor_perl/MHA/Server.pm, ln784] Slave could not be started on 192.168.0.50(192.168.0.50:3306)! Check slave status.
Sat Jul 16 09:15:14 2016 - [error][/usr/share/perl5/vendor_perl/MHA/Server.pm, ln862] Starting slave IO/SQL thread on 192.168.0.50(192.168.0.50:3306) failed!
Sat Jul 16 09:15:14 2016 - [error][/usr/share/perl5/vendor_perl/MHA/MasterRotate.pm, ln573]  Failed!
Sat Jul 16 09:15:14 2016 - [error][/usr/share/perl5/vendor_perl/MHA/MasterRotate.pm, ln602] Switching master to 192.168.0.60(192.168.0.60:3306) done, but switching slaves partially failed.
[root@DBproxy masterha]# 

通过主从机本身的日志判断 可能是主从机中ip和主机名的未做映射导致的。修改hosts

主机的/etc/hosts
127.0.0.1 MyDB01
从机的/etc/hosts
127.0.0.1 MyDB02

修改后主从机器的/etc/hosts
[root@MyDB02 ~]# more /etc/hosts
192.168.0.60  MyDB02
192.168.0.50  MyDB01

因之前的操作为完全成功,导致两台机器为双主架构。手动切换后调整为最初架构一主一从。在线切换前做一次检查:

[root@DBproxy app1]# masterha_check_repl --conf=/data/masterha/app1/app1.cnf
Sat Jul 16 10:24:49 2016 - [warning] Global configuration file /etc/masterha_default.cnf not found. Skipping.
Sat Jul 16 10:24:49 2016 - [info] Reading application default configuration from /data/masterha/app1/app1.cnf..
Sat Jul 16 10:24:49 2016 - [info] Reading server configuration from /data/masterha/app1/app1.cnf..
Sat Jul 16 10:24:49 2016 - [info] MHA::MasterMonitor version 0.56.
Sat Jul 16 10:24:49 2016 - [info] GTID failover mode = 0
Sat Jul 16 10:24:49 2016 - [info] Dead Servers:
Sat Jul 16 10:24:49 2016 - [info] Alive Servers:
Sat Jul 16 10:24:49 2016 - [info]   192.168.0.50(192.168.0.50:3306)
Sat Jul 16 10:24:49 2016 - [info]   192.168.0.60(192.168.0.60:3306)
Sat Jul 16 10:24:49 2016 - [info] Alive Slaves:
Sat Jul 16 10:24:49 2016 - [info]   192.168.0.60(192.168.0.60:3306)  Version=5.6.29-log (oldest major version between slaves) log-bin:enabled
Sat Jul 16 10:24:49 2016 - [info]     Replicating from 192.168.0.50(192.168.0.50:3306)
Sat Jul 16 10:24:49 2016 - [info]     Primary candidate for the new Master (candidate_master is set)
Sat Jul 16 10:24:49 2016 - [info] Current Alive Master: 192.168.0.50(192.168.0.50:3306)
Sat Jul 16 10:24:49 2016 - [info] Checking slave configurations..
Sat Jul 16 10:24:49 2016 - [info]  read_only=1 is not set on slave 192.168.0.60(192.168.0.60:3306).
Sat Jul 16 10:24:49 2016 - [info] Checking replication filtering settings..
Sat Jul 16 10:24:49 2016 - [info]  binlog_do_db= , binlog_ignore_db= 
Sat Jul 16 10:24:49 2016 - [info]  Replication filtering check ok.
Sat Jul 16 10:24:49 2016 - [info] GTID (with auto-pos) is not supported
Sat Jul 16 10:24:49 2016 - [info] Starting SSH connection tests..
Sat Jul 16 10:24:50 2016 - [info] All SSH connection tests passed successfully.
Sat Jul 16 10:24:50 2016 - [info] Checking MHA Node version..
Sat Jul 16 10:24:51 2016 - [info]  Version check ok.
Sat Jul 16 10:24:51 2016 - [info] Checking SSH publickey authentication settings on the current master..
Sat Jul 16 10:24:51 2016 - [info] HealthCheck: SSH to 192.168.0.50 is reachable.
Sat Jul 16 10:24:51 2016 - [info] Master MHA Node version is 0.56.
Sat Jul 16 10:24:51 2016 - [info] Checking recovery script configurations on 192.168.0.50(192.168.0.50:3306)..
Sat Jul 16 10:24:51 2016 - [info]   Executing command: save_binary_logs --command=test --start_pos=4 --binlog_dir=/data/mysql/3306/binlog --output_file=/data/masterha/app1/save_binary_logs_test --manager_version=0.56 --start_file=mysql-bin.000010 
Sat Jul 16 10:24:51 2016 - [info]   Connecting to root@192.168.0.50(192.168.0.50:22).. 
  Creating /data/masterha/app1 if not exists..    ok.
  Checking output directory is accessible or not..
   ok.
  Binlog found at /data/mysql/3306/binlog, up to mysql-bin.000010
Sat Jul 16 10:24:52 2016 - [info] Binlog setting check done.
Sat Jul 16 10:24:52 2016 - [info] Checking SSH publickey authentication and checking recovery script configurations on all alive slave servers..
Sat Jul 16 10:24:52 2016 - [info]   Executing command : apply_diff_relay_logs --command=test --slave_user=‘root‘ --slave_host=192.168.0.60 --slave_ip=192.168.0.60 --slave_port=3306 --workdir=/data/masterha/app1 --target_version=5.6.29-log --manager_version=0.56 --relay_log_info=/data/mysql/3306/data/relay-log.info  --relay_dir=/data/mysql/3306/data/  --slave_pass=xxx
Sat Jul 16 10:24:52 2016 - [info]   Connecting to root@192.168.0.60(192.168.0.60:22).. 
  Checking slave recovery environment settings..
    Opening /data/mysql/3306/data/relay-log.info ... ok.
    Relay log found at /data/mysql/3306/binlog, up to relay-bin.000002
    Temporary relay log file is /data/mysql/3306/binlog/relay-bin.000002
    Testing mysql connection and privileges.. done.
    Testing mysqlbinlog output.. done.
    Cleaning up test file(s).. done.
Sat Jul 16 10:24:53 2016 - [info] Slaves settings check done.
Sat Jul 16 10:24:53 2016 - [info] 
192.168.0.50(192.168.0.50:3306) (current master)
 +--192.168.0.60(192.168.0.60:3306)

Sat Jul 16 10:24:53 2016 - [info] Checking replication health on 192.168.0.60..
Sat Jul 16 10:24:53 2016 - [info]  ok.
Sat Jul 16 10:24:53 2016 - [warning] master_ip_failover_script is not defined.
Sat Jul 16 10:24:53 2016 - [warning] shutdown_script is not defined.
Sat Jul 16 10:24:53 2016 - [info] Got exit code 0 (Not master dead).

MySQL Replication Health is OK.

 

5.实施切换

[root@DBproxy app1]# masterha_master_switch --conf=/data/masterha/app1/app1.cnf --master_state=alive --new_master_host=192.168.0.60 --orig_master_is_new_slave --running_updates_limit=10000 --interactive=0
Sat Jul 16 10:26:59 2016 - [info] MHA::MasterRotate version 0.56.
Sat Jul 16 10:26:59 2016 - [info] Starting online master switch..
Sat Jul 16 10:26:59 2016 - [info] 
Sat Jul 16 10:26:59 2016 - [info] * Phase 1: Configuration Check Phase..
Sat Jul 16 10:26:59 2016 - [info] 
Sat Jul 16 10:26:59 2016 - [warning] Global configuration file /etc/masterha_default.cnf not found. Skipping.
Sat Jul 16 10:26:59 2016 - [info] Reading application default configuration from /data/masterha/app1/app1.cnf..
Sat Jul 16 10:26:59 2016 - [info] Reading server configuration from /data/masterha/app1/app1.cnf..
Sat Jul 16 10:26:59 2016 - [info] GTID failover mode = 0
Sat Jul 16 10:26:59 2016 - [info] Current Alive Master: 192.168.0.50(192.168.0.50:3306)
Sat Jul 16 10:26:59 2016 - [info] Alive Slaves:
Sat Jul 16 10:26:59 2016 - [info]   192.168.0.60(192.168.0.60:3306)  Version=5.6.29-log (oldest major version between slaves) log-bin:enabled
Sat Jul 16 10:26:59 2016 - [info]     Replicating from 192.168.0.50(192.168.0.50:3306)
Sat Jul 16 10:26:59 2016 - [info]     Primary candidate for the new Master (candidate_master is set)
Sat Jul 16 10:26:59 2016 - [info] Executing FLUSH NO_WRITE_TO_BINLOG TABLES. This may take long time..
Sat Jul 16 10:26:59 2016 - [info]  ok.
Sat Jul 16 10:26:59 2016 - [info] Checking MHA is not monitoring or doing failover..
Sat Jul 16 10:26:59 2016 - [info] Checking replication health on 192.168.0.60..
Sat Jul 16 10:26:59 2016 - [info]  ok.
Sat Jul 16 10:26:59 2016 - [info] 192.168.0.60 can be new master.
Sat Jul 16 10:26:59 2016 - [info] 
From:
192.168.0.50(192.168.0.50:3306) (current master)
 +--192.168.0.60(192.168.0.60:3306)

To:
192.168.0.60(192.168.0.60:3306) (new master)
 +--192.168.0.50(192.168.0.50:3306)
Sat Jul 16 10:26:59 2016 - [info] Checking whether 192.168.0.60(192.168.0.60:3306) is ok for the new master..
Sat Jul 16 10:26:59 2016 - [info]  ok.
Sat Jul 16 10:26:59 2016 - [info] 192.168.0.50(192.168.0.50:3306): SHOW SLAVE STATUS returned empty result. To check replication filtering rules, temporarily executing CHANGE MASTER to a dummy host.
Sat Jul 16 10:26:59 2016 - [info] 192.168.0.50(192.168.0.50:3306): Resetting slave pointing to the dummy host.
Sat Jul 16 10:26:59 2016 - [info] ** Phase 1: Configuration Check Phase completed.
Sat Jul 16 10:26:59 2016 - [info] 
Sat Jul 16 10:26:59 2016 - [info] * Phase 2: Rejecting updates Phase..
Sat Jul 16 10:26:59 2016 - [info] 
Sat Jul 16 10:26:59 2016 - [warning] master_ip_online_change_script is not defined. Skipping disabling writes on the current master.
Sat Jul 16 10:26:59 2016 - [info] Locking all tables on the orig master to reject updates from everybody (including root):
Sat Jul 16 10:26:59 2016 - [info] Executing FLUSH TABLES WITH READ LOCK..
Sat Jul 16 10:26:59 2016 - [info]  ok.
Sat Jul 16 10:26:59 2016 - [info] Orig master binlog:pos is mysql-bin.000010:120.
Sat Jul 16 10:26:59 2016 - [info]  Waiting to execute all relay logs on 192.168.0.60(192.168.0.60:3306)..
Sat Jul 16 10:27:00 2016 - [info]  master_pos_wait(mysql-bin.000010:120) completed on 192.168.0.60(192.168.0.60:3306). Executed 0 events.
Sat Jul 16 10:27:00 2016 - [info]   done.
Sat Jul 16 10:27:00 2016 - [info] Getting new master‘s binlog name and position..
Sat Jul 16 10:27:00 2016 - [info]  mysql-bin.000008:239
Sat Jul 16 10:27:00 2016 - [info]  All other slaves should start replication from here. Statement should be: CHANGE MASTER TO MASTER_HOST=‘192.168.0.60‘, MASTER_PORT=3306, MASTER_LOG_FILE=‘mysql-bin.000008‘, MASTER_LOG_POS=239, MASTER_USER=‘repl‘, MASTER_PASSWORD=‘xxx‘;
Sat Jul 16 10:27:00 2016 - [info] 
Sat Jul 16 10:27:00 2016 - [info] * Switching slaves in parallel..
Sat Jul 16 10:27:00 2016 - [info] 
Sat Jul 16 10:27:00 2016 - [info] Unlocking all tables on the orig master:
Sat Jul 16 10:27:00 2016 - [info] Executing UNLOCK TABLES..
Sat Jul 16 10:27:00 2016 - [info]  ok.
Sat Jul 16 10:27:00 2016 - [info] Starting orig master as a new slave..
Sat Jul 16 10:27:00 2016 - [info]  Resetting slave 192.168.0.50(192.168.0.50:3306) and starting replication from the new master 192.168.0.60(192.168.0.60:3306)..
Sat Jul 16 10:27:00 2016 - [info]  Executed CHANGE MASTER.
Sat Jul 16 10:27:00 2016 - [info]  Slave started.
Sat Jul 16 10:27:00 2016 - [info] All new slave servers switched successfully.
Sat Jul 16 10:27:00 2016 - [info] 
Sat Jul 16 10:27:00 2016 - [info] * Phase 5: New master cleanup phase..
Sat Jul 16 10:27:00 2016 - [info] 
Sat Jul 16 10:27:00 2016 - [info]  192.168.0.60: Resetting slave info succeeded.
Sat Jul 16 10:27:00 2016 - [info] Switching master to 192.168.0.60(192.168.0.60:3306) completed successfully.
[root@DBproxy app1]# 

切换后,注意调整配置文件(--conf=/data/masterha/app1/app1.cnf)中主从server的顺序,MHA可能是默认第一个server为master

 

MHA在线切换过程

标签:

原文地址:http://www.cnblogs.com/polestar/p/5737121.html

(0)
(0)
   
举报
评论 一句话评论(0
登录后才能评论!
© 2014 mamicode.com 版权所有  联系我们:gaon5@hotmail.com
迷上了代码!