标签:manager keepalived node mha mariadb
一、MHA
MHA是开源的MySQL的高可用程序,它为MySQL的主从复制架构提供了主节点故障自动转移的功能,它会监控master节点故障的时候,会提升其中的拥有最新数据的slave节点称为新的master节点,同时它还提供了master节点的在线切换的功能,按需切换master/slave节点。
MHA服务有两种角色,MHA Manager和MHA Node:
MHA Manager(管理节点):单独部署在一台独立机器上管理多个master/slave主从复制集群, 每个master/slave称为一个application |
MHA Node(数据节点):运行在每台MySQL服务器上,通过脚本来加快故障转移 |
二、MHA组件
Mnager管理节点: -- masterha_check_ssh : 检查MHA的SSH配置。 |
Node节点: -- save_binary_logs : 保存和复制master的二进制日志。 |
三、MHA实现主节点故障自动故障转移过程:
1、准备工作
①测试环境centos6.6
主机名 | MHA角色 | MySQL主从架构角色 | ip地址 |
manager | Manager | 172.16.16.8 | |
node1 | Node | master | 172.16.16.2 |
node2 | Node | slave1 | 172.16.16.3 |
node3 | Node | slave2 | 172.16.16.4 |
②由于在每个节点上的很多操作都很近似,故使用ansible进行了!配置epel源安装ansible
# yum install -y python-jinja2-2.2.1-2.el6_5.x86_64.rpm PyYAML-3.10-3.1.el6.x86_64.rpm # yum install -y ansible # vim /etc/ansible/hosts //定义ansible需要管理的节点 [mha] 172.16.16.8 //MHA的manager节点 [db] 172.16.16.2 //MySQL的master 172.16.16.3 //MySQL的slave1 172.16.16.4 //MySQL的slave2
③时间同步及主机互信等 ;基于manager做所有设置;
# vim /etc/hosts //主机名解析 172.16.16.8 manager 172.16.16.2 node1 172.16.16.3 node2 172.16.16.4 node3 # vim /etc/resolv.conf //设置名称服务器 nameserver 172.16.0.1 # vim /etc/selinux/config //设置selinux SELINUX=permissive # ssh-keygen -t rsa -P ‘‘ //生成密钥对 Generating public/private rsa key pair. Enter file in which to save the key (/root/.ssh/id_rsa): Created directory ‘/root/.ssh‘. Your identification has been saved in /root/.ssh/id_rsa. Your public key has been saved in /root/.ssh/id_rsa.pub. The key fingerprint is: f1:e9:55:b5:98:94:40:1b:1e:b5:ef:d3:24:da:3b:cc root@manager The key‘s randomart image is: +--[ RSA 2048]----+ | .=oo. .| | . =.+..| | . o +.. | | o . .. | | S o . ...| | . . o.o.| | . .o.o.| | E..| | .. | +-----------------+ [root@manager ~]# cat .ssh/id_rsa.pub >> .ssh/authorized_keys //公钥信息,4个节点使用同一个 [root@manager ~]# cd .ssh/ [root@manager .ssh]# chmod 600 ./* //设置权限 [root@manager ~]# ansible all -m ping //测试ansible是否可正常管理定义节点 172.16.16.8 | success >> { "changed": false, "ping": "pong" } 172.16.16.2 | success >> { "changed": false, "ping": "pong" } 172.16.16.3 | success >> { "changed": false, "ping": "pong" } 172.16.16.4 | success >> { "changed": false, "ping": "pong" } //为db的3个节点配置相关信息 # ansible all -m shell -a ‘yum install -y libselinux-python‘ #ansible db -m copy -a "src=/etc/hosts dest=/etc/hosts" # ansible db -m copy -a "src=/etc/yum.repo.d/epel.repo dest=/etc/yum.repo.d/epel.repo" # ansible db -m copy -a "src=/etc/resolv.conf dest=/etc/resolv.conf" # ansible db -m copy -a "src=/etc/selinux/config dest=/etc/selinux/config" # ansible db -m yum -a ‘name=mysql-server state=present‘ //为db节点安装mysql
2、配置MySQL的主从复制架构
master
# vim /etc/my.cnf innodb_file_per_table=1 # skip_name_resolve=1 //centos6.6自带的mysql-server不支持?? log_bin=master-bin relay_log=relay-log server_id=1 // 启动mysqld时候报错 [ERROR] Fatal error: Can‘t open and lock privilege tables: Table ‘mysql.host‘ doesn‘t exist //解决方法:# mysql_install_db //先要初始化数据库 [root@node1 ~]# service mysqld start Starting mysqld: [ OK ] mysql> show master status; //查看当前使用的二进制日志的事件位置 +-------------------+----------+--------------+------------------+ | File | Position | Binlog_Do_DB | Binlog_Ignore_DB | +-------------------+----------+--------------+------------------+ | master-bin.000003 | 106 | | | +-------------------+----------+--------------+------------------+ 1 row in set (0.00 sec) mysql> grant replication slave, replication client on *.* to ‘repluser‘@‘172.16.16.%‘ identified by ‘replpass‘; //授权复制用户 Query OK, 0 rows affected (0.00 sec) mysql> flush privileges; Query OK, 0 rows affected (0.00 sec) mysql> grant all on *.* to ‘mha‘@‘172.16.16.%‘ identified by ‘mhapass‘; Query OK, 0 rows affected (0.00 sec) //授权远程登录用户 mysql> flush privileges; Query OK, 0 rows affected (0.00 sec)
slave1和slave2
# vim /etc/my.cnf innodb_file_per_table=1 //每表一个表空间 skip_name_resolve //跳过主机名反解 log_bin=master-bin //二进制日志 relay_log=relay-log //中继日志 server_id=2 // 全局唯一id号; 注:slave2的设置为server_id=3 read_only=1 relay_log_purge=0 mysql> change master to master_host=‘172.16.16.2‘,master_user=‘repluser‘,master_password=‘replpass‘,master_log_file=‘master-bin.000003‘,master_log_pos=106; Query OK, 0 rows affected (0.08 sec) mysql> start slave; Query OK, 0 rows affected (0.00 sec) mysql> show slave status\G; *************************** 1. row *************************** Slave_IO_State: Waiting for master to send event Master_Host: 172.16.16.2 Master_User: repluser Master_Port: 3306 Connect_Retry: 60 Master_Log_File: master-bin.000003 Read_Master_Log_Pos: 572 Relay_Log_File: relay-log.000002 Relay_Log_Pos: 718 Relay_Master_Log_File: master-bin.000003 Slave_IO_Running: Yes Slave_SQL_Running: Yes Replicate_Do_DB: Replicate_Ignore_DB: Replicate_Do_Table: Replicate_Ignore_Table: Replicate_Wild_Do_Table: Replicate_Wild_Ignore_Table: Last_Errno: 0 Last_Error: Skip_Counter: 0 Exec_Master_Log_Pos: 572 Relay_Log_Space: 867 Until_Condition: None Until_Log_File: Until_Log_Pos: 0 Master_SSL_Allowed: No Master_SSL_CA_File: Master_SSL_CA_Path: Master_SSL_Cert: Master_SSL_Cipher: Master_SSL_Key: Seconds_Behind_Master: 0 Master_SSL_Verify_Server_Cert: No Last_IO_Errno: 0 Last_IO_Error: Last_SQL_Errno: 0 Last_SQL_Error: 1 row in set (0.00 sec)
至此主从复制架构配置完成!!!接下来配置MHA~~
3、配置MHA Manager和MHA Node节点
//MHA Manager要安装的所有包及相应依赖包 ; mha4mysql-manager-0.56-0.el6.noarch.rpm mha4mysql-node-0.56-0.el6.noarch.rpm perl-Config-Tiny-2.12-7.1.el6.noarch.rpm perl-Email-Date-Format-1.002-5.el6.noarch.rpm perl-MIME-Lite-3.027-2.el6.noarch.rpm perl-MIME-Types-1.28-2.el6.noarch.rpm //MHA Node只需安装 mha4mysql-node-0.56-0.el6.noarch.rpm //安装后生成的文件 [root@manager ~]# rpm -ql mha4mysql-manager /usr/bin/masterha_check_repl /usr/bin/masterha_check_ssh /usr/bin/masterha_check_status /usr/bin/masterha_conf_host /usr/bin/masterha_manager /usr/bin/masterha_master_monitor /usr/bin/masterha_master_switch /usr/bin/masterha_secondary_check /usr/bin/masterha_stop [root@manager ~]# rpm -ql mha4mysql-node /usr/bin/apply_diff_relay_logs /usr/bin/filter_mysqlbinlog /usr/bin/purge_relay_logs /usr/bin/save_binary_logs [server default] user=mha //远程登录MySQL数据库的用户名 password=mhapass //远程登录MySQL数据库用户的密码 manager_workdir=/data/mha/app1 //MHA管理节点的工作目录,自动创建 manager_log=/data/mha/app1/manager.log //日志文件 remote_workdir=/data/mha/app1 //远程节点的工作目录 ssh_user=root //ssh远程连接默认使用的用户 repl_user=repluser //MySQL集群复制数据库的用户名 repl_password=replpass //密码 ping_interval=1 //ping主节点的时间间隔为1s [server1] //定义MySQL集群节点 hostname=172.16.16.2 candidate_master=1 //有可能会提升为主节点 [server2] hostname=172.16.16.3 candidate_master=1 [server3] hostname=172.16.16.4 candidate_master=1 ~ //检查各节点间ssh互信通信配置是否ok! [root@manager ~]# masterha_check_ssh --conf=/etc/mha/app1.cnf Sat Oct 24 02:14:49 2015 - [warning] Global configuration file /etc/masterha_default.cnf not found. Skipping. Sat Oct 24 02:14:49 2015 - [info] Reading application default configuration from /etc/mha/app1.cnf.. Sat Oct 24 02:14:49 2015 - [info] Reading server configuration from /etc/mha/app1.cnf.. Sat Oct 24 02:14:49 2015 - [info] Starting SSH connection tests.. Sat Oct 24 02:14:52 2015 - [debug] Sat Oct 24 02:14:49 2015 - [debug] Connecting via SSH from root@172.16.16.2(172.16.16.2:22) to root@172.16.16.3(172.16.16.3:22).. Sat Oct 24 02:14:50 2015 - [debug] ok. Sat Oct 24 02:14:50 2015 - [debug] Connecting via SSH from root@172.16.16.2(172.16.16.2:22) to root@172.16.16.4(172.16.16.4:22).. Sat Oct 24 02:14:52 2015 - [debug] ok. Sat Oct 24 02:14:52 2015 - [debug] Sat Oct 24 02:14:50 2015 - [debug] Connecting via SSH from root@172.16.16.3(172.16.16.3:22) to root@172.16.16.2(172.16.16.2:22).. Sat Oct 24 02:14:51 2015 - [debug] ok. Sat Oct 24 02:14:51 2015 - [debug] Connecting via SSH from root@172.16.16.3(172.16.16.3:22) to root@172.16.16.4(172.16.16.4:22).. Sat Oct 24 02:14:52 2015 - [debug] ok. Sat Oct 24 02:14:53 2015 - [debug] Sat Oct 24 02:14:50 2015 - [debug] Connecting via SSH from root@172.16.16.4(172.16.16.4:22) to root@172.16.16.2(172.16.16.2:22).. Sat Oct 24 02:14:52 2015 - [debug] ok. Sat Oct 24 02:14:52 2015 - [debug] Connecting via SSH from root@172.16.16.4(172.16.16.4:22) to root@172.16.16.3(172.16.16.3:22).. Sat Oct 24 02:14:53 2015 - [debug] ok. Sat Oct 24 02:14:53 2015 - [info] All SSH connection tests passed successfully. //检查管理的MySQL复制集群的连接配置参数是否ok? [root@manager ~]# masterha_check_repl --conf=/etc/mha/app1.cnf Sat Oct 24 02:14:55 2015 - [warning] Global configuration file /etc/masterha_default.cnf not found. Skipping. Sat Oct 24 02:14:55 2015 - [info] Reading application default configuration from /etc/mha/app1.cnf.. Sat Oct 24 02:14:55 2015 - [info] Reading server configuration from /etc/mha/app1.cnf.. Sat Oct 24 02:14:55 2015 - [info] MHA::MasterMonitor version 0.56. Sat Oct 24 02:14:55 2015 - [info] GTID failover mode = 0 Sat Oct 24 02:14:55 2015 - [info] Dead Servers: Sat Oct 24 02:14:55 2015 - [info] Alive Servers: Sat Oct 24 02:14:55 2015 - [info] 172.16.16.2(172.16.16.2:3306) Sat Oct 24 02:14:55 2015 - [info] 172.16.16.3(172.16.16.3:3306) Sat Oct 24 02:14:55 2015 - [info] 172.16.16.4(172.16.16.4:3306) Sat Oct 24 02:14:55 2015 - [info] Alive Slaves: Sat Oct 24 02:14:55 2015 - [info] 172.16.16.3(172.16.16.3:3306) Version=5.1.73-log (oldest major version between slaves) log-bin:enabled Sat Oct 24 02:14:55 2015 - [info] Replicating from 172.16.16.2(172.16.16.2:3306) Sat Oct 24 02:14:55 2015 - [info] Primary candidate for the new Master (candidate_master is set) Sat Oct 24 02:14:55 2015 - [info] 172.16.16.4(172.16.16.4:3306) Version=5.1.73-log (oldest major version between slaves) log-bin:enabled Sat Oct 24 02:14:55 2015 - [info] Replicating from 172.16.16.2(172.16.16.2:3306) Sat Oct 24 02:14:55 2015 - [info] Primary candidate for the new Master (candidate_master is set) Sat Oct 24 02:14:55 2015 - [info] Current Alive Master: 172.16.16.2(172.16.16.2:3306) Sat Oct 24 02:14:55 2015 - [info] Checking slave configurations.. Sat Oct 24 02:14:55 2015 - [info] Checking replication filtering settings.. Sat Oct 24 02:14:55 2015 - [info] binlog_do_db= , binlog_ignore_db= Sat Oct 24 02:14:55 2015 - [info] Replication filtering check ok. Sat Oct 24 02:14:55 2015 - [info] GTID (with auto-pos) is not supported Sat Oct 24 02:14:55 2015 - [info] Starting SSH connection tests.. Sat Oct 24 02:14:59 2015 - [info] All SSH connection tests passed successfully. Sat Oct 24 02:14:59 2015 - [info] Checking MHA Node version.. Sat Oct 24 02:15:00 2015 - [info] Version check ok. Sat Oct 24 02:15:00 2015 - [info] Checking SSH publickey authentication settings on the current master.. Sat Oct 24 02:15:01 2015 - [info] HealthCheck: SSH to 172.16.16.2 is reachable. Sat Oct 24 02:15:01 2015 - [info] Master MHA Node version is 0.56. Sat Oct 24 02:15:01 2015 - [info] Checking recovery script configurations on 172.16.16.2(172.16.16.2:3306).. Sat Oct 24 02:15:01 2015 - [info] Executing command: save_binary_logs --command=test --start_pos=4 --binlog_dir=/var/lib/mysql,/var/log/mysql --output_file=/data/mha/app1/save_binary_logs_test --manager_version=0.56 --start_file=master-bin.000004 Sat Oct 24 02:15:01 2015 - [info] Connecting to root@172.16.16.2(172.16.16.2:22).. Creating /data/mha/app1 if not exists.. ok. Checking output directory is accessible or not.. ok. Binlog found at /var/lib/mysql, up to master-bin.000004 Sat Oct 24 02:15:02 2015 - [info] Binlog setting check done. Sat Oct 24 02:15:02 2015 - [info] Checking SSH publickey authentication and checking recovery script configurations on all alive slave servers.. Sat Oct 24 02:15:02 2015 - [info] Executing command : apply_diff_relay_logs --command=test --slave_user=‘mha‘ --slave_host=172.16.16.3 --slave_ip=172.16.16.3 --slave_port=3306 --workdir=/data/mha/app1 --target_version=5.1.73-log --manager_version=0.56 --relay_log_info=/var/lib/mysql/relay-log.info --relay_dir=/var/lib/mysql/ --slave_pass=xxx Sat Oct 24 02:15:02 2015 - [info] Connecting to root@172.16.16.3(172.16.16.3:22).. Checking slave recovery environment settings.. Opening /var/lib/mysql/relay-log.info ... ok. Relay log found at /var/lib/mysql, up to relay-log.000005 Temporary relay log file is /var/lib/mysql/relay-log.000005 Testing mysql connection and privileges.. done. Testing mysqlbinlog output.. done. Cleaning up test file(s).. done. Sat Oct 24 02:15:03 2015 - [info] Executing command : apply_diff_relay_logs --command=test --slave_user=‘mha‘ --slave_host=172.16.16.4 --slave_ip=172.16.16.4 --slave_port=3306 --workdir=/data/mha/app1 --target_version=5.1.73-log --manager_version=0.56 --relay_log_info=/var/lib/mysql/relay-log.info --relay_dir=/var/lib/mysql/ --slave_pass=xxx Sat Oct 24 02:15:03 2015 - [info] Connecting to root@172.16.16.4(172.16.16.4:22).. Creating directory /data/mha/app1.. done. Checking slave recovery environment settings.. Opening /var/lib/mysql/relay-log.info ... ok. Relay log found at /var/lib/mysql, up to relay-log.000006 Temporary relay log file is /var/lib/mysql/relay-log.000006 Testing mysql connection and privileges.. done. Testing mysqlbinlog output.. done. Cleaning up test file(s).. done. Sat Oct 24 02:15:03 2015 - [info] Slaves settings check done. Sat Oct 24 02:15:03 2015 - [info] 172.16.16.2(172.16.16.2:3306) (current master) //看到这段日志信息 +--172.16.16.3(172.16.16.3:3306) +--172.16.16.4(172.16.16.4:3306) Sat Oct 24 02:15:03 2015 - [info] Checking replication health on 172.16.16.3.. Sat Oct 24 02:15:03 2015 - [info] ok. Sat Oct 24 02:15:03 2015 - [info] Checking replication health on 172.16.16.4.. Sat Oct 24 02:15:03 2015 - [info] ok. Sat Oct 24 02:15:03 2015 - [warning] master_ip_failover_script is not defined.//没有vip Sat Oct 24 02:15:03 2015 - [warning] shutdown_script is not defined. //没有这个脚本 Sat Oct 24 02:15:03 2015 - [info] Got exit code 0 (Not master dead). MySQL Replication Health is OK. //看到这个信息说明ok //启动 MHA [root@manager ~]# nohup masterha_manager --conf=/etc/mha/app1.cnf > /data/mha/app1/manager.log 2>&1 & [1] 4007 //查看监控状态;主节点是172.16.16.2 [root@manager ~]# masterha_check_status --conf=/etc/mha/app1.cnf app1 (pid:4007) is running(0:PING_OK), master:172.16.16.2
4、测试故障转移
(1)在master节点上关闭mysqld服务
[root@node1 ~]# killall -9 mysqld mysqld_safe
(2)查看manager节点的日志
----- Failover Report ----- //故障转移报告 app1: MySQL Master failover 172.16.16.2(172.16.16.2:3306) to 172.16.16.3(172.16.16.3:3306) succeeded Master 172.16.16.2(172.16.16.2:3306) is down! Check MHA Manager logs at manager:/data/mha/app1/manager.log for details. Started automated(non-interactive) failover. The latest slave 172.16.16.3(172.16.16.3:3306) has all relay logs for recovery. Selected 172.16.16.3(172.16.16.3:3306) as a new master. 172.16.16.3(172.16.16.3:3306): OK: Applying all logs succeeded. 172.16.16.4(172.16.16.4:3306): This host has the latest relay log events. Generating relay diff files from the latest slave succeeded. 172.16.16.4(172.16.16.4:3306): OK: Applying all logs succeeded. Slave started, replicating from 172.16.16.3(172.16.16.3:3306) 172.16.16.3(172.16.16.3:3306): Resetting slave info succeeded. Master failover to 172.16.16.3(172.16.16.3:3306) completed successfully.
5、还需改进的地方
需要提供下面的2个脚本;后续补充?!
master_ip_failover_script //主节点ip转移,类似于keepalived的vip shutdown_script //为避免资源争用,类似于高可用集群的stonith
至此,MHA高可用工具介绍完毕,基本实现了主节点的出现故障后的自动转移,O(∩_∩)O~~
本文出自 “bengbengtu” 博客,请务必保留此出处http://bengbengtu.blog.51cto.com/9505633/1711643
MariaDB数据库介绍三、MHA(Master HA)实现主节点故障转移
标签:manager keepalived node mha mariadb
原文地址:http://bengbengtu.blog.51cto.com/9505633/1711643