MariaDB数据库介绍三、MHA（Master HA）实现主节点故障转移

时间：2015-11-11 11:51:58 阅读：430 评论：0 收藏：0 [点我收藏+]

一、MHA

MHA是开源的MySQL的高可用程序，它为MySQL的主从复制架构提供了主节点故障自动转移的功能，它会监控master节点故障的时候，会提升其中的拥有最新数据的slave节点称为新的master节点，同时它还提供了master节点的在线切换的功能，按需切换master/slave节点。

MHA服务有两种角色，MHA Manager和MHA Node：

MHA Manager（管理节点）：单独部署在一台独立机器上管理多个master/slave主从复制集群，

每个master/slave称为一个application

MHA Node（数据节点）：运行在每台MySQL服务器上，通过脚本来加快故障转移

二、MHA组件

Mnager管理节点：

-- masterha_check_ssh : 检查MHA的SSH配置。
-- masterha_check_repl : 检查MySQL复制。
--masterha_manager : 启动MHA。
-- masterha_check_status : 检测当前MHA运行状态。
-- masterha_master_monitor : 监测master是否宕机。
-- masterha_master_switch : 控制故障转移(自动或手动)。
-- masterha_conf_host : 添加或删除配置的server信息。

Node节点：

-- save_binary_logs : 保存和复制master的二进制日志。
-- apply_diff_relay_logs : 识别差异的中继日志事件并应用于其它slave。
-- filter_mysqlbinlog : 去除不必要的ROLLBACK事件(MHA已不再使用这个工具)。
-- purge_relay_logs : 清除中继日志(不会阻塞SQL线程)。

三、MHA实现主节点故障自动故障转移过程：

1、准备工作

①测试环境centos6.6

主机名	MHA角色	MySQL主从架构角色	ip地址
manager	Manager		172.16.16.8
node1	Node	master	172.16.16.2
node2	Node	slave1	172.16.16.3
node3	Node	slave2	172.16.16.4

②由于在每个节点上的很多操作都很近似，故使用ansible进行了！配置epel源安装ansible

# yum install -y python-jinja2-2.2.1-2.el6_5.x86_64.rpm PyYAML-3.10-3.1.el6.x86_64.rpm
# yum install -y ansible            
# vim /etc/ansible/hosts        //定义ansible需要管理的节点
[mha]                           
172.16.16.8                     //MHA的manager节点
[db]
172.16.16.2                     //MySQL的master
172.16.16.3                     //MySQL的slave1
172.16.16.4                     //MySQL的slave2

③时间同步及主机互信等 ;基于manager做所有设置；

# vim /etc/hosts                //主机名解析
172.16.16.8 manager
172.16.16.2 node1
172.16.16.3 node2
172.16.16.4 node3

# vim /etc/resolv.conf            //设置名称服务器
nameserver 172.16.0.1

# vim /etc/selinux/config         //设置selinux
SELINUX=permissive

# ssh-keygen -t rsa -P ‘‘                      //生成密钥对
Generating public/private rsa key pair.
Enter file in which to save the key (/root/.ssh/id_rsa): 
Created directory ‘/root/.ssh‘.
Your identification has been saved in /root/.ssh/id_rsa.
Your public key has been saved in /root/.ssh/id_rsa.pub.
The key fingerprint is:
f1:e9:55:b5:98:94:40:1b:1e:b5:ef:d3:24:da:3b:cc root@manager  
The key‘s randomart image is:
+--[ RSA 2048]----+
|          .=oo. .|
|          . =.+..|
|        .  o +.. |
|         o . ..  |
|        S o . ...|
|         . . o.o.|
|          . .o.o.|
|              E..|
|              .. |
+-----------------+
[root@manager ~]# cat .ssh/id_rsa.pub >> .ssh/authorized_keys
                                                  //公钥信息，4个节点使用同一个 
[root@manager ~]# cd .ssh/    
[root@manager .ssh]# chmod 600 ./*           //设置权限
                                      
[root@manager ~]# ansible all -m ping      //测试ansible是否可正常管理定义节点
172.16.16.8 | success >> {
    "changed": false, 
    "ping": "pong"
}

172.16.16.2 | success >> {
    "changed": false, 
    "ping": "pong"
}

172.16.16.3 | success >> {
    "changed": false, 
    "ping": "pong"
}

172.16.16.4 | success >> {
    "changed": false, 
    "ping": "pong"
}    

//为db的3个节点配置相关信息 
# ansible all -m shell -a ‘yum install -y libselinux-python‘   
#ansible db -m copy -a "src=/etc/hosts dest=/etc/hosts"   
# ansible db -m copy -a "src=/etc/yum.repo.d/epel.repo dest=/etc/yum.repo.d/epel.repo"   # ansible db -m copy -a "src=/etc/resolv.conf dest=/etc/resolv.conf"    
# ansible db -m copy -a "src=/etc/selinux/config dest=/etc/selinux/config"  
# ansible db -m yum -a ‘name=mysql-server state=present‘   //为db节点安装mysql

2、配置MySQL的主从复制架构

master

# vim /etc/my.cnf
innodb_file_per_table=1
# skip_name_resolve=1     //centos6.6自带的mysql-server不支持？？ 
log_bin=master-bin
relay_log=relay-log
server_id=1

// 启动mysqld时候报错
 [ERROR] Fatal error: Can‘t open and lock privilege tables: Table ‘mysql.host‘ doesn‘t exist
 //解决方法：# mysql_install_db   //先要初始化数据库 
 [root@node1 ~]# service mysqld start
Starting mysqld:                                           [  OK  ]

mysql> show master status;       //查看当前使用的二进制日志的事件位置
+-------------------+----------+--------------+------------------+
| File              | Position | Binlog_Do_DB | Binlog_Ignore_DB |
+-------------------+----------+--------------+------------------+
| master-bin.000003 |      106 |              |                  |
+-------------------+----------+--------------+------------------+
1 row in set (0.00 sec)

mysql> grant replication slave, replication client on *.* to ‘repluser‘@‘172.16.16.%‘ identified by ‘replpass‘;               //授权复制用户
Query OK, 0 rows affected (0.00 sec)

mysql> flush privileges;
Query OK, 0 rows affected (0.00 sec)

mysql> grant all on *.* to ‘mha‘@‘172.16.16.%‘ identified by ‘mhapass‘;
Query OK, 0 rows affected (0.00 sec)         //授权远程登录用户

mysql> flush privileges;
Query OK, 0 rows affected (0.00 sec)

slave1和slave2

# vim /etc/my.cnf 
innodb_file_per_table=1          //每表一个表空间
skip_name_resolve                          //跳过主机名反解
log_bin=master-bin              //二进制日志
relay_log=relay-log             //中继日志
server_id=2                     // 全局唯一id号； 注：slave2的设置为server_id=3
read_only=1
relay_log_purge=0     

mysql> change master to master_host=‘172.16.16.2‘,master_user=‘repluser‘,master_password=‘replpass‘,master_log_file=‘master-bin.000003‘,master_log_pos=106;
Query OK, 0 rows affected (0.08 sec)

mysql> start slave;
Query OK, 0 rows affected (0.00 sec)

mysql> show slave status\G;
*************************** 1. row ***************************
               Slave_IO_State: Waiting for master to send event
                  Master_Host: 172.16.16.2
                  Master_User: repluser
                  Master_Port: 3306
                Connect_Retry: 60
              Master_Log_File: master-bin.000003
          Read_Master_Log_Pos: 572
               Relay_Log_File: relay-log.000002
                Relay_Log_Pos: 718
        Relay_Master_Log_File: master-bin.000003
             Slave_IO_Running: Yes
            Slave_SQL_Running: Yes
              Replicate_Do_DB: 
          Replicate_Ignore_DB: 
           Replicate_Do_Table: 
       Replicate_Ignore_Table: 
      Replicate_Wild_Do_Table: 
  Replicate_Wild_Ignore_Table: 
                   Last_Errno: 0
                   Last_Error: 
                 Skip_Counter: 0
          Exec_Master_Log_Pos: 572
              Relay_Log_Space: 867
              Until_Condition: None
               Until_Log_File: 
                Until_Log_Pos: 0
           Master_SSL_Allowed: No
           Master_SSL_CA_File: 
           Master_SSL_CA_Path: 
              Master_SSL_Cert: 
            Master_SSL_Cipher: 
               Master_SSL_Key: 
        Seconds_Behind_Master: 0
Master_SSL_Verify_Server_Cert: No
                Last_IO_Errno: 0
                Last_IO_Error: 
               Last_SQL_Errno: 0
               Last_SQL_Error: 
1 row in set (0.00 sec)

至此主从复制架构配置完成！！！接下来配置MHA~~

3、配置MHA Manager和MHA Node节点

//MHA Manager要安装的所有包及相应依赖包 ； 
 mha4mysql-manager-0.56-0.el6.noarch.rpm            
 mha4mysql-node-0.56-0.el6.noarch.rpm
 perl-Config-Tiny-2.12-7.1.el6.noarch.rpm
 perl-Email-Date-Format-1.002-5.el6.noarch.rpm
 perl-MIME-Lite-3.027-2.el6.noarch.rpm
 perl-MIME-Types-1.28-2.el6.noarch.rpm
 
//MHA Node只需安装
mha4mysql-node-0.56-0.el6.noarch.rpm
 
//安装后生成的文件
 [root@manager ~]# rpm -ql mha4mysql-manager
/usr/bin/masterha_check_repl
/usr/bin/masterha_check_ssh
/usr/bin/masterha_check_status
/usr/bin/masterha_conf_host
/usr/bin/masterha_manager
/usr/bin/masterha_master_monitor
/usr/bin/masterha_master_switch
/usr/bin/masterha_secondary_check
/usr/bin/masterha_stop

[root@manager ~]# rpm -ql mha4mysql-node
/usr/bin/apply_diff_relay_logs
/usr/bin/filter_mysqlbinlog
/usr/bin/purge_relay_logs
/usr/bin/save_binary_logs


[server default]
user=mha                                  //远程登录MySQL数据库的用户名
password=mhapass                          //远程登录MySQL数据库用户的密码
manager_workdir=/data/mha/app1            //MHA管理节点的工作目录,自动创建
manager_log=/data/mha/app1/manager.log    //日志文件
remote_workdir=/data/mha/app1             //远程节点的工作目录
ssh_user=root                             //ssh远程连接默认使用的用户
repl_user=repluser                       //MySQL集群复制数据库的用户名
repl_password=replpass                   //密码
ping_interval=1                           //ping主节点的时间间隔为1s

[server1]                            //定义MySQL集群节点
hostname=172.16.16.2             
candidate_master=1                 //有可能会提升为主节点

[server2]
hostname=172.16.16.3
candidate_master=1

[server3]
hostname=172.16.16.4
candidate_master=1
~                    
//检查各节点间ssh互信通信配置是否ok！
[root@manager ~]# masterha_check_ssh --conf=/etc/mha/app1.cnf 
Sat Oct 24 02:14:49 2015 - [warning] Global configuration file /etc/masterha_default.cnf not found. Skipping.
Sat Oct 24 02:14:49 2015 - [info] Reading application default configuration from /etc/mha/app1.cnf..
Sat Oct 24 02:14:49 2015 - [info] Reading server configuration from /etc/mha/app1.cnf..
Sat Oct 24 02:14:49 2015 - [info] Starting SSH connection tests..
Sat Oct 24 02:14:52 2015 - [debug] 
Sat Oct 24 02:14:49 2015 - [debug]  Connecting via SSH from root@172.16.16.2(172.16.16.2:22) to root@172.16.16.3(172.16.16.3:22)..
Sat Oct 24 02:14:50 2015 - [debug]   ok.
Sat Oct 24 02:14:50 2015 - [debug]  Connecting via SSH from root@172.16.16.2(172.16.16.2:22) to root@172.16.16.4(172.16.16.4:22)..
Sat Oct 24 02:14:52 2015 - [debug]   ok.
Sat Oct 24 02:14:52 2015 - [debug] 
Sat Oct 24 02:14:50 2015 - [debug]  Connecting via SSH from root@172.16.16.3(172.16.16.3:22) to root@172.16.16.2(172.16.16.2:22)..
Sat Oct 24 02:14:51 2015 - [debug]   ok.
Sat Oct 24 02:14:51 2015 - [debug]  Connecting via SSH from root@172.16.16.3(172.16.16.3:22) to root@172.16.16.4(172.16.16.4:22)..
Sat Oct 24 02:14:52 2015 - [debug]   ok.
Sat Oct 24 02:14:53 2015 - [debug] 
Sat Oct 24 02:14:50 2015 - [debug]  Connecting via SSH from root@172.16.16.4(172.16.16.4:22) to root@172.16.16.2(172.16.16.2:22)..
Sat Oct 24 02:14:52 2015 - [debug]   ok.
Sat Oct 24 02:14:52 2015 - [debug]  Connecting via SSH from root@172.16.16.4(172.16.16.4:22) to root@172.16.16.3(172.16.16.3:22)..
Sat Oct 24 02:14:53 2015 - [debug]   ok.
Sat Oct 24 02:14:53 2015 - [info] All SSH connection tests passed successfully.


//检查管理的MySQL复制集群的连接配置参数是否ok？
[root@manager ~]# masterha_check_repl --conf=/etc/mha/app1.cnf 
Sat Oct 24 02:14:55 2015 - [warning] Global configuration file /etc/masterha_default.cnf not found. Skipping.
Sat Oct 24 02:14:55 2015 - [info] Reading application default configuration from /etc/mha/app1.cnf..
Sat Oct 24 02:14:55 2015 - [info] Reading server configuration from /etc/mha/app1.cnf..
Sat Oct 24 02:14:55 2015 - [info] MHA::MasterMonitor version 0.56.
Sat Oct 24 02:14:55 2015 - [info] GTID failover mode = 0
Sat Oct 24 02:14:55 2015 - [info] Dead Servers:
Sat Oct 24 02:14:55 2015 - [info] Alive Servers:
Sat Oct 24 02:14:55 2015 - [info]   172.16.16.2(172.16.16.2:3306)
Sat Oct 24 02:14:55 2015 - [info]   172.16.16.3(172.16.16.3:3306)
Sat Oct 24 02:14:55 2015 - [info]   172.16.16.4(172.16.16.4:3306)
Sat Oct 24 02:14:55 2015 - [info] Alive Slaves:
Sat Oct 24 02:14:55 2015 - [info]   172.16.16.3(172.16.16.3:3306)  Version=5.1.73-log (oldest major version between slaves) log-bin:enabled
Sat Oct 24 02:14:55 2015 - [info]     Replicating from 172.16.16.2(172.16.16.2:3306)
Sat Oct 24 02:14:55 2015 - [info]     Primary candidate for the new Master (candidate_master is set)
Sat Oct 24 02:14:55 2015 - [info]   172.16.16.4(172.16.16.4:3306)  Version=5.1.73-log (oldest major version between slaves) log-bin:enabled
Sat Oct 24 02:14:55 2015 - [info]     Replicating from 172.16.16.2(172.16.16.2:3306)
Sat Oct 24 02:14:55 2015 - [info]     Primary candidate for the new Master (candidate_master is set)
Sat Oct 24 02:14:55 2015 - [info] Current Alive Master: 172.16.16.2(172.16.16.2:3306)
Sat Oct 24 02:14:55 2015 - [info] Checking slave configurations..
Sat Oct 24 02:14:55 2015 - [info] Checking replication filtering settings..
Sat Oct 24 02:14:55 2015 - [info]  binlog_do_db= , binlog_ignore_db= 
Sat Oct 24 02:14:55 2015 - [info]  Replication filtering check ok.
Sat Oct 24 02:14:55 2015 - [info] GTID (with auto-pos) is not supported
Sat Oct 24 02:14:55 2015 - [info] Starting SSH connection tests..
Sat Oct 24 02:14:59 2015 - [info] All SSH connection tests passed successfully.
Sat Oct 24 02:14:59 2015 - [info] Checking MHA Node version..
Sat Oct 24 02:15:00 2015 - [info]  Version check ok.
Sat Oct 24 02:15:00 2015 - [info] Checking SSH publickey authentication settings on the current master..
Sat Oct 24 02:15:01 2015 - [info] HealthCheck: SSH to 172.16.16.2 is reachable.
Sat Oct 24 02:15:01 2015 - [info] Master MHA Node version is 0.56.
Sat Oct 24 02:15:01 2015 - [info] Checking recovery script configurations on 172.16.16.2(172.16.16.2:3306)..
Sat Oct 24 02:15:01 2015 - [info]   Executing command: save_binary_logs --command=test --start_pos=4 --binlog_dir=/var/lib/mysql,/var/log/mysql --output_file=/data/mha/app1/save_binary_logs_test --manager_version=0.56 --start_file=master-bin.000004 
Sat Oct 24 02:15:01 2015 - [info]   Connecting to root@172.16.16.2(172.16.16.2:22).. 
  Creating /data/mha/app1 if not exists..    ok.
  Checking output directory is accessible or not..
   ok.
  Binlog found at /var/lib/mysql, up to master-bin.000004
Sat Oct 24 02:15:02 2015 - [info] Binlog setting check done.
Sat Oct 24 02:15:02 2015 - [info] Checking SSH publickey authentication and checking recovery script configurations on all alive slave servers..
Sat Oct 24 02:15:02 2015 - [info]   Executing command : apply_diff_relay_logs --command=test --slave_user=‘mha‘ --slave_host=172.16.16.3 --slave_ip=172.16.16.3 --slave_port=3306 --workdir=/data/mha/app1 --target_version=5.1.73-log --manager_version=0.56 --relay_log_info=/var/lib/mysql/relay-log.info  --relay_dir=/var/lib/mysql/  --slave_pass=xxx
Sat Oct 24 02:15:02 2015 - [info]   Connecting to root@172.16.16.3(172.16.16.3:22).. 
  Checking slave recovery environment settings..
    Opening /var/lib/mysql/relay-log.info ... ok.
    Relay log found at /var/lib/mysql, up to relay-log.000005
    Temporary relay log file is /var/lib/mysql/relay-log.000005
    Testing mysql connection and privileges.. done.
    Testing mysqlbinlog output.. done.
    Cleaning up test file(s).. done.
Sat Oct 24 02:15:03 2015 - [info]   Executing command : apply_diff_relay_logs --command=test --slave_user=‘mha‘ --slave_host=172.16.16.4 --slave_ip=172.16.16.4 --slave_port=3306 --workdir=/data/mha/app1 --target_version=5.1.73-log --manager_version=0.56 --relay_log_info=/var/lib/mysql/relay-log.info  --relay_dir=/var/lib/mysql/  --slave_pass=xxx
Sat Oct 24 02:15:03 2015 - [info]   Connecting to root@172.16.16.4(172.16.16.4:22).. 
Creating directory /data/mha/app1.. done.
  Checking slave recovery environment settings..
    Opening /var/lib/mysql/relay-log.info ... ok.
    Relay log found at /var/lib/mysql, up to relay-log.000006
    Temporary relay log file is /var/lib/mysql/relay-log.000006
    Testing mysql connection and privileges.. done.
    Testing mysqlbinlog output.. done.
    Cleaning up test file(s).. done.
Sat Oct 24 02:15:03 2015 - [info] Slaves settings check done.
Sat Oct 24 02:15:03 2015 - [info] 
172.16.16.2(172.16.16.2:3306) (current master)      //看到这段日志信息
 +--172.16.16.3(172.16.16.3:3306)
 +--172.16.16.4(172.16.16.4:3306)

Sat Oct 24 02:15:03 2015 - [info] Checking replication health on 172.16.16.3..
Sat Oct 24 02:15:03 2015 - [info]  ok.
Sat Oct 24 02:15:03 2015 - [info] Checking replication health on 172.16.16.4..
Sat Oct 24 02:15:03 2015 - [info]  ok.
Sat Oct 24 02:15:03 2015 - [warning] master_ip_failover_script is not defined.//没有vip
Sat Oct 24 02:15:03 2015 - [warning] shutdown_script is not defined.  //没有这个脚本
Sat Oct 24 02:15:03 2015 - [info] Got exit code 0 (Not master dead).
MySQL Replication Health is OK.    //看到这个信息说明ok


//启动 MHA
[root@manager ~]# nohup masterha_manager --conf=/etc/mha/app1.cnf > /data/mha/app1/manager.log 2>&1 &
[1] 4007

//查看监控状态；主节点是172.16.16.2
[root@manager ~]# masterha_check_status --conf=/etc/mha/app1.cnf 
app1 (pid:4007) is running(0:PING_OK), master:172.16.16.2

4、测试故障转移

（1）在master节点上关闭mysqld服务

[root@node1 ~]# killall -9 mysqld mysqld_safe

（2）查看manager节点的日志

----- Failover Report -----   //故障转移报告 

app1: MySQL Master failover 172.16.16.2(172.16.16.2:3306) to 172.16.16.3(172.16.16.3:3306) succeeded

Master 172.16.16.2(172.16.16.2:3306) is down!

Check MHA Manager logs at manager:/data/mha/app1/manager.log for details.

Started automated(non-interactive) failover.
The latest slave 172.16.16.3(172.16.16.3:3306) has all relay logs for recovery.
Selected 172.16.16.3(172.16.16.3:3306) as a new master.
172.16.16.3(172.16.16.3:3306): OK: Applying all logs succeeded.
172.16.16.4(172.16.16.4:3306): This host has the latest relay log events.
Generating relay diff files from the latest slave succeeded.
172.16.16.4(172.16.16.4:3306): OK: Applying all logs succeeded. Slave started, replicating from 172.16.16.3(172.16.16.3:3306)
172.16.16.3(172.16.16.3:3306): Resetting slave info succeeded.
Master failover to 172.16.16.3(172.16.16.3:3306) completed successfully.

5、还需改进的地方

需要提供下面的2个脚本；后续补充？！

 master_ip_failover_script    //主节点ip转移，类似于keepalived的vip
 shutdown_script             //为避免资源争用，类似于高可用集群的stonith

至此，MHA高可用工具介绍完毕，基本实现了主节点的出现故障后的自动转移，O(∩_∩)O~~

本文出自 “bengbengtu” 博客，请务必保留此出处http://bengbengtu.blog.51cto.com/9505633/1711643

MariaDB数据库介绍三、MHA（Master HA）实现主节点故障转移

标签：manager keepalived node mha mariadb

原文地址：http://bengbengtu.blog.51cto.com/9505633/1711643

踩

(0)

评论一句话评论（0）

分享档案

更多>

2021年07月29日 (22)
2021年07月28日 (40)
2021年07月27日 (32)
2021年07月26日 (79)
2021年07月23日 (29)
2021年07月22日 (30)
2021年07月21日 (42)
2021年07月20日 (16)
2021年07月19日 (90)
2021年07月16日 (35)

周排行