MHA工作原理
相关软件包
下面是实验环境
首先去官网下载MHA软件包,注意:需要×××
vim /etc/my.cnf
在[mysqld]语句块中添加如下配置
[mysqld]
log-bin #开启二进制日志
server_id=1 #设置所有节点中唯一的id编号
innodb_file_per_table #开启数据于表结构分离,两个文件存放
skip_name_resolve=1 #跳过DNS解析
yum install mha4mysql-node-0.54-1.el5.noarch
systemctl start mariadb
systemctl enable mariadb
"mysql_secure_installation"
第一项问你:输入root密码 回车即可,因为没有
第二项问你:需要设置root密码么,
第三项问你:需要删除空账号用户么,
第四项问你:禁止root用户远程登入么,
第五项问你:需要删除test测试数据库么,
第六项问你:现在重新加载权限表吗 ,
1,创建拥有复制权限的用户账号
GRANT REPLICATION SLAVE ON *.* TO ‘repluser‘@‘HOST‘ IDENTIFIED BY ‘replpass‘;
命令解析:
‘repluser‘@‘HOST‘ :设置用户名即可登入的主机ip或网段,网段用%表示 例如10.0.0.%
IDENTIFIED BY:设置密码
*.* :表示所有数据库,所有表
GRANT REPLCATION SLAVE:就是允许该用户复制数据
该命令作用就是授权repluser能拷贝数据库的所有内容
2,创建MHA管理用户
GRANT REPLICATION SLAVE ON *.* TO ‘mha‘@‘HOST‘ IDENTIFIED BY ‘replpass‘;
命令解析:
‘mha‘@‘HOST‘ :设置用户名即可登入的主机ip或网段,网段用%表示 例如10.0.0.%
IDENTIFIED BY:设置密码
*.* :表示所有数据库,所有表
GRANT ALL :就表示该用户拥有所有权限的意思
vim /etc/my.cnf
server_id=2
read_only #普通用户只有读权限,对超级用户没有限制
log-bin
relay_log_purge=0 #不自动清理日志
skip_name_resolve=1
innodb_file_per_table
"log-bin"
"注意:正常来讲MySQL主从复制,从服务器是不需要启用二进制日志的,
这里为什么从服务器要启用二进制日志呢?因为基于MHA管理,当原master服务器down机了,
MHA会自动提升一台数据变化不是很大的从服务器为新的主,因为主必须开启二进制日志所以必须添加"
yum install mha4mysql-node-0.54-1.el5.noarch
systemctl start mariadb
systemctl enable mariadb
"mysql_secure_installation"
第一项问你:输入root密码 回车即可,因为没有
第二项问你:需要设置root密码么,
第三项问你:需要删除空账号用户么,
第四项问你:禁止root用户远程登入么,
第五项问你:需要删除test测试数据库么,
第六项问你:现在重新加载权限表吗 ,
1,使用有复制权限的用户账号连接至主服务器
CHANGE MASTER TO
MASTER_HOST=‘master_host‘, #指定master主机IP
MASTER_USER=‘repluser‘, #指定master被授权的用户名
MASTER_PASSWORD=‘replpass‘, #指定被master授权的用户密码 MASTER_LOG_FILE=‘mysql-bin.xxxxx‘, #指定master服务器的哪个二进制日志开始复制
MASTER_LOG_POS=#; #二进制日志位置,可以在master服务器上执行该命令查看,show master logs;
2,启动复制线程IO_THREAD和SQL_THREAD
START SLAVE;
MariaDB [(none)]> show slave status\G
*************************** 1. row ***************************
Slave_IO_State: Waiting for master to send event
Master_Host: 192.168.68.17
Master_User: repluser
Master_Port: 3306
Connect_Retry: 60
Master_Log_File: mariadb-bin.000001
Read_Master_Log_Pos: 557
Relay_Log_File: mariadb-relay-bin.000002
Relay_Log_Pos: 843
Relay_Master_Log_File: mariadb-bin.000001
Slave_IO_Running: Yes "重点关注如果是NO表示线程没起来"
Slave_SQL_Running: Yes "重点关注 如果是NO表示该线程没起来"
Replicate_Do_DB:
Replicate_Ignore_DB:
Replicate_Do_Table:
Replicate_Ignore_Table:
Replicate_Wild_Do_Table:
Replicate_Wild_Ignore_Table:
Last_Errno: 0
Last_Error:
Skip_Counter: 0
Exec_Master_Log_Pos: 557
Relay_Log_Space: 1139
Until_Condition: None
Until_Log_File:
Until_Log_Pos: 0
Master_SSL_Allowed: No
Master_SSL_CA_File:
Master_SSL_CA_Path:
Master_SSL_Cert:
Master_SSL_Cipher:
Master_SSL_Key:
Seconds_Behind_Master: 0 "该项表示同步时间 0表示即使同步"
Master_SSL_Verify_Server_Cert: No
Last_IO_Errno: 0
Last_IO_Error:
Last_SQL_Errno: 0
Last_SQL_Error:
Replicate_Ignore_Server_Ids:
Master_Server_Id: 1
1,在M1上创建数据库
MariaDB [(none)]> create database a1;
Query OK, 1 row affected (0.00 sec)
M1 [(none)]> show databases;
+--------------------+
| Database |
+--------------------+
| information_schema |
| a1 |
| mysql |
| performance_schema |
| test |
+--------------------+
5 rows in set (0.00 sec)
2,在S1,S2上查看同步情况。
S1 [(none)]> show databases;
+--------------------+
| Database |
+--------------------+
| information_schema |
| a1 |
| mysql |
| performance_schema |
| test |
+--------------------+
5 rows in set (0.01 sec)
S2 [(none)]> show databases;
+--------------------+
| Database |
+--------------------+
| information_schema |
| a1 |
| mysql |
| performance_schema |
| test |
+--------------------+
5 rows in set (0.01 sec)
①,在任意一台主机上生成一对公私钥,例如,在M1上生成
ssh-keygen
这是候会在本机生成一个.ssh/目录
ls .ssh/
id_rsa id_rsa.pub
②,然后再把整个.ssh/目录复制给本机M1
ssh-copy-id M1_IP
这时候本机.ssh目录还会多两个文件
ls .ssh/
authorized_keys id_rsa id_rsa.pub known_hosts
③,将整个.ssh目录复制到其他所有主机
ssh-copy-id MHA_IP
ssh-copy-id S1_IP
ssh-copy-id S2_IP
④,ssh连接测试,不用输入密码表示成功
[root@MHA ~]# ssh 192.168.68.7
Last login: Fri Mar 30 13:21:52 2018 from 192.168.68.1
[root@master ~]#
原理:所有主机都是用的一个公钥,其实这些主机会以为就是一台主机,所以这个.ssh文件千万不要泄露了,如果泄露了就意味着可以不需要密码登入任意一台主机,这就是为什么只能在局域网中使用的原因,如果,想在公网使用的话,就需要在每一台主机上生成公私钥,在每一台主机都互相复制一遍,或者是都复制到一台主机上,再把这台主机上的.ssh目录下的authorized_keys 这个文件复制到其他的所有主机上
注意:需要配置epel源,用阿里云的epel源就可以
yum install mha4mysql-manager-0.55-1.el5.noarch.rpm mha4mysql-node-0.54-1.el5.noarch.rpm
mkdir -pv /etc/mha/
vim /etc/mha/app1.cnf
#添加如下项目
[server default] #默认规则
user=mha #mhauser(mysql内配置的用来管理数据库的用户)
password=123123 #密码
manager_workdir=/data/mha/test/ #工作目录
manager_log=/data/mha/test/manager.log #日志文件
remote_workdir=/data/mha/test/ #节点的工作目录
ssh_user=root #ssh用户
repl_user=reluser #主从复制用户(mysql内配置的用来复制数据库的用户)
repl_password=123123 #密码
ping_interval=1 #心跳检测间隔(秒)
[server1] #节点名称
hostname=172.18.30.1 #节点地址
candidate_master=1 #表示允许提升为主服务器
[server2] #节点名称
hostname=172.18.30.2 #节点地址
candidate_master=1 #表示允许提升为主服务器
[server3] #节点名称
hostname=172.18.30.4 #节点地址
[root@test ~]# masterha_check_ssh --conf=/etc/mha/app1.cnf
Sat Mar 31 19:18:47 2018 - [warning] Global configuration file /etc/masterha_default.cnf not found. Skipping.
Sat Mar 31 19:18:47 2018 - [info] Reading application default configuration from /etc/mha/app1.cnf..
Sat Mar 31 19:18:47 2018 - [info] Reading server configuration from /etc/mha/app1.cnf..
Sat Mar 31 19:18:47 2018 - [info] Starting SSH connection tests..
Sat Mar 31 19:18:49 2018 - [debug]
Sat Mar 31 19:18:47 2018 - [debug] Connecting via SSH from root@172.18.30.107(172.18.30.107:22) to root@172.18.30.108(172.18.30.108:22)..
Sat Mar 31 19:18:48 2018 - [debug] ok.
Sat Mar 31 19:18:48 2018 - [debug] Connecting via SSH from root@172.18.30.107(172.18.30.107:22) to root@172.18.30.109(172.18.30.109:22)..
Sat Mar 31 19:18:49 2018 - [debug] ok.
Sat Mar 31 19:18:50 2018 - [debug]
Sat Mar 31 19:18:47 2018 - [debug] Connecting via SSH from root@172.18.30.108(172.18.30.108:22) to root@172.18.30.107(172.18.30.107:22)..
Sat Mar 31 19:18:49 2018 - [debug] ok.
Sat Mar 31 19:18:49 2018 - [debug] Connecting via SSH from root@172.18.30.108(172.18.30.108:22) to root@172.18.30.109(172.18.30.109:22)..
Sat Mar 31 19:18:49 2018 - [debug] ok.
Sat Mar 31 19:18:50 2018 - [debug]
Sat Mar 31 19:18:48 2018 - [debug] Connecting via SSH from root@172.18.30.109(172.18.30.109:22) to root@172.18.30.107(172.18.30.107:22)..
Sat Mar 31 19:18:49 2018 - [debug] ok.
Sat Mar 31 19:18:49 2018 - [debug] Connecting via SSH from root@172.18.30.109(172.18.30.109:22) to root@172.18.30.108(172.18.30.108:22)..
Sat Mar 31 19:18:50 2018 - [debug] ok.
Sat Mar 31 19:18:50 2018 - [info] All SSH connection tests passed successfully.
[root@test ~]# masterha_check_repl --conf=/etc/mha/app1.cnf
Sat Mar 31 19:19:26 2018 - [warning] Global configuration file /etc/masterha_default.cnf not found. Skipping.
Sat Mar 31 19:19:26 2018 - [info] Reading application default configuration from /etc/mha/app1.cnf..
Sat Mar 31 19:19:26 2018 - [info] Reading server configuration from /etc/mha/app1.cnf..
Sat Mar 31 19:19:26 2018 - [info] MHA::MasterMonitor version 0.56.
Sat Mar 31 19:19:28 2018 - [info] GTID failover mode = 0
Sat Mar 31 19:19:28 2018 - [info] Dead Servers:
Sat Mar 31 19:19:28 2018 - [info] Alive Servers:
Sat Mar 31 19:19:28 2018 - [info] 172.18.30.107(172.18.30.107:3306)
Sat Mar 31 19:19:28 2018 - [info] 172.18.30.108(172.18.30.108:3306)
Sat Mar 31 19:19:28 2018 - [info] 172.18.30.109(172.18.30.109:3306)
Sat Mar 31 19:19:28 2018 - [info] Alive Slaves:
Sat Mar 31 19:19:28 2018 - [info] 172.18.30.108(172.18.30.108:3306) Version=5.5.56-MariaDB (oldest major version between slaves) log-bin:enabled
Sat Mar 31 19:19:28 2018 - [info] Replicating from 172.18.30.107(172.18.30.107:3306)
Sat Mar 31 19:19:28 2018 - [info] Primary candidate for the new Master (candidate_master is set)
Sat Mar 31 19:19:28 2018 - [info] 172.18.30.109(172.18.30.109:3306) Version=5.5.56-MariaDB (oldest major version between slaves) log-bin:enabled
Sat Mar 31 19:19:28 2018 - [info] Replicating from 172.18.30.107(172.18.30.107:3306)
Sat Mar 31 19:19:28 2018 - [info] Primary candidate for the new Master (candidate_master is set)
Sat Mar 31 19:19:28 2018 - [info] Current Alive Master: 172.18.30.107(172.18.30.107:3306)
Sat Mar 31 19:19:28 2018 - [info] Checking slave configurations..
Sat Mar 31 19:19:28 2018 - [info] read_only=1 is not set on slave 172.18.30.108(172.18.30.108:3306).
Sat Mar 31 19:19:28 2018 - [warning] relay_log_purge=0 is not set on slave 172.18.30.108(172.18.30.108:3306).
Sat Mar 31 19:19:28 2018 - [info] read_only=1 is not set on slave 172.18.30.109(172.18.30.109:3306).
Sat Mar 31 19:19:28 2018 - [warning] relay_log_purge=0 is not set on slave 172.18.30.109(172.18.30.109:3306).
Sat Mar 31 19:19:28 2018 - [info] Checking replication filtering settings..
Sat Mar 31 19:19:28 2018 - [info] binlog_do_db= , binlog_ignore_db=
Sat Mar 31 19:19:28 2018 - [info] Replication filtering check ok.
Sat Mar 31 19:19:28 2018 - [info] GTID (with auto-pos) is not supported
Sat Mar 31 19:19:28 2018 - [info] Starting SSH connection tests..
Sat Mar 31 19:19:31 2018 - [info] All SSH connection tests passed successfully.
Sat Mar 31 19:19:31 2018 - [info] Checking MHA Node version..
Sat Mar 31 19:19:32 2018 - [info] Version check ok.
Sat Mar 31 19:19:32 2018 - [info] Checking SSH publickey authentication settings on the current master..
Sat Mar 31 19:19:33 2018 - [info] HealthCheck: SSH to 172.18.30.107 is reachable.
Sat Mar 31 19:19:34 2018 - [info] Master MHA Node version is 0.56.
Sat Mar 31 19:19:34 2018 - [info] Checking recovery script configurations on 172.18.30.107(172.18.30.107:3306)..
Sat Mar 31 19:19:34 2018 - [info] Executing command: save_binary_logs --command=test --start_pos=4 --binlog_dir=/var/lib/mysql,/var/log/mysql --output_file=/data/mastermha/app1//save_binary_logs_test --manager_version=0.56 --start_file=mariadb-bin.000001
Sat Mar 31 19:19:34 2018 - [info] Connecting to root@172.18.30.107(172.18.30.107:22)..
Creating /data/mastermha/app1 if not exists.. ok.
Checking output directory is accessible or not..
ok.
Binlog found at /var/lib/mysql, up to mariadb-bin.000001
Sat Mar 31 19:19:34 2018 - [info] Binlog setting check done.
Sat Mar 31 19:19:34 2018 - [info] Checking SSH publickey authentication and checking recovery script configurations on all alive slave servers..
Sat Mar 31 19:19:34 2018 - [info] Executing command : apply_diff_relay_logs --command=test --slave_user=‘mha‘ --slave_host=172.18.30.108 --slave_ip=172.18.30.108 --slave_port=3306 --workdir=/data/mastermha/app1/ --target_version=5.5.56-MariaDB --manager_version=0.56 --relay_log_info=/var/lib/mysql/relay-log.info --relay_dir=/var/lib/mysql/ --slave_pass=xxx
Sat Mar 31 19:19:34 2018 - [info] Connecting to root@172.18.30.108(172.18.30.108:22)..
Checking slave recovery environment settings..
Opening /var/lib/mysql/relay-log.info ... ok.
Relay log found at /var/lib/mysql, up to mariadb-relay-bin.000002
Temporary relay log file is /var/lib/mysql/mariadb-relay-bin.000002
Testing mysql connection and privileges.. done.
Testing mysqlbinlog output.. done.
Cleaning up test file(s).. done.
Sat Mar 31 19:19:35 2018 - [info] Executing command : apply_diff_relay_logs --command=test --slave_user=‘mha‘ --slave_host=172.18.30.109 --slave_ip=172.18.30.109 --slave_port=3306 --workdir=/data/mastermha/app1/ --target_version=5.5.56-MariaDB --manager_version=0.56 --relay_log_info=/var/lib/mysql/relay-log.info --relay_dir=/var/lib/mysql/ --slave_pass=xxx
Sat Mar 31 19:19:35 2018 - [info] Connecting to root@172.18.30.109(172.18.30.109:22)..
Checking slave recovery environment settings..
Opening /var/lib/mysql/relay-log.info ... ok.
Relay log found at /var/lib/mysql, up to mariadb-relay-bin.000002
Temporary relay log file is /var/lib/mysql/mariadb-relay-bin.000002
Testing mysql connection and privileges.. done.
Testing mysqlbinlog output.. done.
Cleaning up test file(s).. done.
Sat Mar 31 19:19:36 2018 - [info] Slaves settings check done.
Sat Mar 31 19:19:36 2018 - [info]
172.18.30.107(172.18.30.107:3306) (current master)
+--172.18.30.108(172.18.30.108:3306)
+--172.18.30.109(172.18.30.109:3306)
Sat Mar 31 19:19:36 2018 - [info] Checking replication health on 172.18.30.108..
Sat Mar 31 19:19:36 2018 - [info] ok.
Sat Mar 31 19:19:36 2018 - [info] Checking replication health on 172.18.30.109..
Sat Mar 31 19:19:36 2018 - [info] ok.
Sat Mar 31 19:19:36 2018 - [warning] master_ip_failover_script is not defined.
Sat Mar 31 19:19:36 2018 - [warning] shutdown_script is not defined.
Sat Mar 31 19:19:36 2018 - [info] Got exit code 0 (Not master dead).
MySQL Replication Health is OK.
[root@test ~]# masterha_manager --conf=/etc/mha/app1.cnf
Sat Mar 31 19:23:26 2018 - [warning] Global configuration file /etc/masterha_default.cnf not found. Skipping.
Sat Mar 31 19:23:26 2018 - [info] Reading application default configuration from /etc/mha/app1.cnf..
Sat Mar 31 19:23:26 2018 - [info] Reading server configuration from /etc/mha/app1.cnf..
我们手动宕掉M1主机,在理论上来说MHA会将S1机器该接替Master服务器,并自动踢除M1主机,最终S1为master,S2为slave服务器。
1,宕掉M1上的mariadb服务
systemctl stop mariadb
2,登入S1上查看状况
MariaDB [(none)]> show master logs;
+--------------------+-----------+
| Log_name | File_size |
+--------------------+-----------+
| mariadb-bin.000001 | 715 |
| mariadb-bin.000002 | 245 |
+--------------------+-----------+
2 rows in set (0.00 sec)
MariaDB [(none)]> show slave status;
Empty set (0.00 sec)
MariaDB [(none)]> show variables like ‘%read_only%‘;
+------------------+-------+
| Variable_name | Value |
+------------------+-------+
| innodb_read_only | OFF |
| read_only | OFF |
| tx_read_only | OFF |
+------------------+-------+
3 rows in set (0.00 sec)
"我们可以看出,MHA已经完成了切换,之前在B主机设置的read_only选项也已经关闭了。"
3,登入S2上查看情况
MariaDB [(none)]> show slave status\G
*************************** 1. row ***************************
Slave_IO_State: Waiting for master to send event
Master_Host: 172.18.30.108
Master_User: repluser
Master_Port: 3306
Connect_Retry: 60
Master_Log_File: mariadb-bin.000002
Read_Master_Log_Pos: 245
Relay_Log_File: mariadb-relay-bin.000002
Relay_Log_Pos: 531
Relay_Master_Log_File: mariadb-bin.000002
Slave_IO_Running: Yes
Slave_SQL_Running: Yes
Replicate_Do_DB:
Replicate_Ignore_DB:
Replicate_Do_Table:
Replicate_Ignore_Table:
Replicate_Wild_Do_Table:
Replicate_Wild_Ignore_Table:
Last_Errno: 0
Last_Error:
Skip_Counter: 0
Exec_Master_Log_Pos: 245
Relay_Log_Space: 827
Until_Condition: None
Until_Log_File:
Until_Log_Pos: 0
Master_SSL_Allowed: No
Master_SSL_CA_File:
Master_SSL_CA_Path:
Master_SSL_Cert:
Master_SSL_Cipher:
Master_SSL_Key:
Seconds_Behind_Master: 0
Master_SSL_Verify_Server_Cert: No
Last_IO_Errno: 0
Last_IO_Error:
Last_SQL_Errno: 0
Last_SQL_Error:
Replicate_Ignore_Server_Ids:
Master_Server_Id: 2
1 row in set (0.00 sec)
"可以看出,S2主机已经将同步服务器改为了S1主机
测试完成。"
错误1
[root@test ~]# masterha_check_ssh --conf=/etc/mha/app1.cnf
Sat Mar 31 20:14:26 2018 - [warning] Global configuration file /etc/masterha_default.cnf not found. Skipping.
Sat Mar 31 20:14:26 2018 - [info] Reading application default configuration from /etc/mha/app1.cnf..
Sat Mar 31 20:14:26 2018 - [info] Reading server configuration from /etc/mha/app1.cnf..
Sat Mar 31 20:14:26 2018 - [error][/usr/share/perl5/vendor_perl/MHA/Config.pm, ln383] Block name "server" is invalid. Block name must be "server default" or start from "server"(+ non-whitespace characters).
Block name "server" is invalid. Block name must be "server default" or start from "server"(+ non-whitespace characters). at /usr/share/perl5/vendor_perl/MHA/SSHCheck.pm line 148.
"注意:这个问题出在自己创建的MHA配置文件中的被管理的节点没有排序[server]下面是错误文件"
[server default]
user=mha
password=centos
manager_workdir=/data/mastermha/app1/
manager_log=/data/mastermha/app1/manager.log
remote_workdir=/data/mastermha/app1/
ssh_user=root
repl_user=repluser
repl_password=centos
ping_interval=1
"[server]" <-
hostname=172.18.30.107
candidate_master=1
"[server]" <-
hostname=172.18.30.108
candidate_master=1
"[server]" <-
hostname=172.18.30.109
candidate_master=1
"正确配置,仔细查看有什么配置不同"
[server default]
user=mha
password=centos
manager_workdir=/data/mastermha/app1/
manager_log=/data/mastermha/app1/manager.log
remote_workdir=/data/mastermha/app1/
ssh_user=root
repl_user=repluser
repl_password=centos
ping_interval=1
"[server1]"
hostname=172.18.30.107
candidate_master=1
"[server2]"
hostname=172.18.30.108
candidate_master=1
"[server3]"
hostname=172.18.30.109
candidate_master=1
错误2
如果出现下面的问题说明是有一个库找不到,需要在每一个节点都创建一个软连接
[root@centos-MHA ~]# masterha_check_ssh --conf=/etc/mha/app1.cnf
Can‘t locate MHA/SSHCheck.pm in @INC (@INC contains: /usr/local/lib64/perl5 /usr/local/share/perl5 /usr/lib64/perl5/vendor_perl /usr/share/perl5/vendor_perl /usr/lib64/perl5 /usr/share/perl5 .) at /usr/bin/masterha_check_ssh line 25.
BEGIN failed--compilation aborted at /usr/bin/masterha_check_ssh line 25.
[root@centos-MHA ~]# ln -s /usr/lib/perl5/vendor_perl/MHA/ /usr/lib64/perl5/vendor_perl/
[root@centos-M1 ~]# ln -s /usr/lib/perl5/vendor_perl/MHA/ /usr/lib64/perl5/vendor_perl/
[root@centos-S1 ~]# ln -s /usr/lib/perl5/vendor_perl/MHA/ /usr/lib64/perl5/vendor_perl/
[root@centos-S2 ~]# ln -s /usr/lib/perl5/vendor_perl/MHA/ /usr/lib64/perl5/vendor_perl/
原文地址:http://blog.51cto.com/13598893/2093432