标签:centos7.5部署drbd centos 7.5部署heartbe mysql高可用方案 heartbeat+DRBD+mysq
做双机热备方案需要用到Hearbeat和存储设备(如果没存储设备,可以用DRBD代替,但是最好用存储设备)。DRBD(代替存储设备):Distributed Replicated Block Device(DRBD)是一个用软件实现的、无共享的、服务器之间镜像块设备内容的存储复制解决方案。用来将两台服务器的数据同步成一模一样,只能一台服务器挂载。可以理解为DRBD其实就是个网络Raid 1。
DRBD原理参考:
https://www.cnblogs.com/guoting1202/p/3975685.html
https://blog.csdn.net/leshami/article/details/49509919
一、环境描述
系统版本:centos7.5 x64
DRBD版本:DRBD-8.4.3
node1(主节点)IP: 192.168.1.54 主机名:drbd1.db.com
node2(从节点)IP: 192.168.1.52 主机名:drbd2.db.com
虚拟IP地址(VIP):192.168.1.55
(node1) 仅为主节点配置
(node2) 仅为从节点配置
(node1,node2) 为主从节点共同配置
二、安装前准备
1、更改主机名和hosts记录(node1、node2)
node1:
# cat /etc/hostname drbd1.db.com # cat /etc/hosts 127.0.0.1 localhost drbd1.db.com localhost4 localhost4.localdomain4 ::1 localhost localhost.localdomain localhost6 localhost6.localdomain6 192.168.1.54 drbd1.db.com 192.168.1.52 drbd2.db.com
node2:
# cat /etc/hostname drbd2.db.com # cat /etc/hosts 127.0.0.1 localhost drbd2.db.com localhost4 localhost4.localdomain4 ::1 localhost localhost.localdomain localhost6 localhost6.localdomain6 192.168.1.54 drbd1.db.com 192.168.1.52 drbd2.db.com
2、关闭iptables和SELINUX,避免安装过程中报错,部署完成后可以再开启(node1,node2)
# systemctl stop firewalld # systemctl disable firewalld # setenforce 0 # vi /etc/selinux/config --------------- SELINUX=disabled ---------------
3、重启服务器(node1、node2)
4、当前方案两台服务器/dev/mapper/centos-home的lvm分区大小一样,并且都挂载到/home目录了。所有要先卸载挂载、删除lv、重新建立lv、然后不格式化不挂载、创建/store目录(node1,node2)
# umount /home # vi /etc/fstab # lvremove /dev/mapper/centos-home # lvcreate -n db -l +247071 centos xfs signature detected on /dev/centos/db at offset 0. Wipe it? (会提示这个,我都是输入的n) # mkdir /store
备注:总之不管是普通分区,还是lv。drbd需要的是干净的分区,不要格式化
5、时间同步(node1,node2)
# yum install -y rdate # rdate -s time-b.nist.gov
三、DRBD的安装配置
1、安装依赖包:(node1,node2)
# yum install gcc gcc-c++ make glibc flex kernel-devel kernel-headers
----------------------------这步不要执行,因为centos 7.5编译安装不成功----------------------------
2、编译安装DRBD,在centos 7.5上编译安装出错了(6.X没问题),而且网上也没有解决办法:(node1,node2)
# wget http://www.drbd.org/download/drbd/8.4/archive/drbd-8.4.3.tar.gz # tar zxvf drbd-8.4.3.tar.gz # cd drbd-8.4.3 # ./configure --prefix=/usr/local/drbd --with-km # make KDIR=/usr/src/kernels/3.10.0-862.2.3.el7.x86_64/ (请替换成您操作系统内核版本) # make install # mkdir -p /usr/local/drbd/var/run/drbd # cp /usr/local/drbd/etc/rc.d/init.d/drbd /etc/rc.d/init.d # chkconfig --add drbd # chkconfig drbd on
----------------------------这步不要执行,因为centos 7.5编译安装不成功----------------------------
3、由于编译安装没成功,所以选择yum方式安装(node1,node2)
# rpm --import https://www.elrepo.org/RPM-GPG-KEY-elrepo.org # rpm -Uvh http://www.elrepo.org/elrepo-release-7.0-2.el7.elrepo.noarch.rpm # yum install -y kmod-drbd84 drbd84-utils # systemctl enable drbd
4、配置文件介绍
# /etc/drbd.conf #主配置文件 # /etc/drbd.d/global_common.conf #全局配置文件
5、加载DRBD模块、查看DRBD模块是否加载到内核:(node1,node2)
# modprobe drbd # lsmod |grep drbd drbd 397041 0 libcrc32c 12644 2 xfs,drbd
如果加载DRBD模块报下面的错误:
# modprobe drbd
FATAL: Module drbd not found.
备注:由于在安装依赖包的时候,已经安装kernel,所以一般情况下不会出现下面的错误。如果出现了可以先尝试重启看下,如果重启后还是不行,就按照下面的方法操作:
原因:这个报错是因为内核并不支持此模块,所以需要更新内核,
更新内核的方法是:yum install kernel(备注:如果没报错不建议更新)
更新后,记得一定要重新启动操作系统!!!
重启系统后再次使用命令查看,此时的内核版本变为
# uname -r
此时再次尝试加载模块drbd
# modprobe drbd
6、参数配置:(node1,node2)
# vi /etc/drbd.d/db.res resource r0{ protocol C; startup { wfc-timeout 0; degr-wfc-timeout 120;} disk { on-io-error detach;} net{ timeout 60; connect-int 10; ping-int 10; max-buffers 2048; max-epoch-size 2048; } syncer { rate 200M;} on drbd1.db.com{ device /dev/drbd0; disk /dev/centos/db; address 192.168.1.54:7788; meta-disk internal; } on drbd2.db.com{ device /dev/drbd0; disk /dev/centos/db; address 192.168.1.52:7788; meta-disk internal; } }
注:请修改上面配置中的主机名、IP、和disk为自己的具体配置
注:之前我是直接删除/usr/local/drbd/etc/drbd.conf里面内容,直接在这里面加入上面的信息
7、创建DRBD设备并激活r0资源:(node1,node2)
# mknod /dev/drbd0 b 147 0 # drbdadm create-md r0 等待片刻,显示success表示drbd块创建成功 md_offset 1036290879488 al_offset 1036290846720 bm_offset 1036259221504 Found some data ==> This might destroy existing data! <== Do you want to proceed? [need to type 'yes' to confirm] yes initializing activity log initializing bitmap (30884 KB) to all zero Writing meta data... New drbd meta data block successfully created. success 注意:如果等很久都没提示success,就按下回车键再等等。 再次输入该命令: # drbdadm create-md r0 成功激活r0 You want me to create a v08 style flexible-size internal meta data block. There appears to be a v08 flexible-size internal meta data block already in place on /dev/centos/db at byte offset 1036290879488 Do you really want to overwrite the existing meta-data? [need to type 'yes' to confirm] yes md_offset 1036290879488 al_offset 1036290846720 bm_offset 1036259221504 Found some data ==> This might destroy existing data! <== Do you want to proceed? [need to type 'yes' to confirm] yes initializing activity log initializing bitmap (30884 KB) to all zero Writing meta data... New drbd meta data block successfully created.
8、启动DRBD服务:(node1,node2)
# systemctl start drbd # systemctl status drbd
注意:需要主从共同启动方能生效
9、查看状态:(node1,node2)
# cat /proc/drbd version: 8.4.11-1 (api:1/proto:86-101) GIT-hash: 66145a308421e9c124ec391a7848ac20203bb03c build by mockbuild@, 2018-04-26 12:10:42 0: cs:Connected ro:Secondary/Secondary ds:Inconsistent/Inconsistent C r----- ns:0 nr:0 dw:0 dr:0 al:8 bm:0 lo:0 pe:0 ua:0 ap:0 ep:1 wo:f oos:1011971896
这里ro:Secondary/Secondary表示两台主机的状态都是备机状态,ds是磁盘状态,显示的状态内容为“Inconsistent不一致”,这是因为DRBD无法判断哪一方为主机,应以哪一方的磁盘数据作为标准。
10、将drbd1.gxm.com主机配置为主节点:(node1,注意只有node1,这步一定要等待显示下面的状态后才能执行下一步)
# drbdsetup /dev/drbd0 primary --force 查看同步过程: # cat /proc/drbd version: 8.4.11-1 (api:1/proto:86-101) GIT-hash: 66145a308421e9c124ec391a7848ac20203bb03c build by mockbuild@, 2018-04-26 12:10:42 0: cs:SyncSource ro:Primary/Secondary ds:UpToDate/Inconsistent C r----- ns:286720 nr:0 dw:0 dr:288816 al:8 bm:0 lo:0 pe:1 ua:0 ap:0 ep:1 wo:f oos:1011686200 [>....................] sync'ed: 0.1% (987972/988252)M finish: 6:52:41 speed: 40,812 (40,812) K/sec 查看同步完成后的状态: (node1) # cat /proc/drbd drbd driver loaded OK; device status: version: 8.4.3 (api:1/proto:86-101) GIT-hash: 89a294209144b68adb3ee85a73221f964d3ee515 build by root@drbd1.gxm.com, 2015-05-12 21:05:41 m:res cs ro ds p mounted fstype 0:r0 Connected Primary/Secondary UpToDate/UpToDate C (node2) # cat /proc/drbd drbd driver loaded OK; device status: version: 8.4.3 (api:1/proto:86-101) GIT-hash: 89a294209144b68adb3ee85a73221f964d3ee515 build by root@drbd2.gxm.com, 2015-05-12 21:05:46 m:res cs ro ds p mounted fstype 0:r0 Connected Secondary/Primary UpToDate/UpToDate C
备注:ro在主从服务器上分别显示 Primary/Secondary和Secondary/Primary
ds显示UpToDate/UpToDate,表示主从配置成功(注意这个需要时间初始化和同步的,请等待显示成上面的状态后再执行下面的步骤)。
11、挂载DRBD:(node1,注意只有node1)
从刚才的状态上看到mounted和fstype参数为空,所以我们这步开始挂载DRBD到系统目录/store
# mkfs.ext4 /dev/drbd0 # mount /dev/drbd0 /store # df -h
注:Secondary节点上不允许对DRBD设备进行任何操作,包括挂载;所有的读写操作只能在Primary节点上进行,只有当Primary节点挂掉时,Secondary节点才能提升为Primary节点,并自动挂载DRBD继续工作。
成功挂载后的DRBD状态:(node1,注意只有node1)
# cat /proc/drbd drbd driver loaded OK; device status: version: 8.4.3 (api:1/proto:86-101) GIT-hash: 89a294209144b68adb3ee85a73221f964d3ee515 build by root@drbd1.gxm.com, 2015-05-12 21:05:41 m:res cs ro ds p mounted fstype 0:r0 Connected Primary/Secondary UpToDate/UpToDate C /store ext4
四、安装业务程序(比如mysql),将数据目录放在/store分区里
数据目录可以改数据库的配置文件,将数据存储目录改成/store。或者用软链接方式,我用的最多的就是软链接这种。
1、将业务程序服务停止、并且设置开机不自动启动
# systemctl stop mysqld # systemctl disable mysqld
2、在node1进行以下操作(先把/store目录挂载到node1)
移动目录到 /store 目录
# mv /usr/local/kkmail/data/mysql/default/kkmail /store # mv /usr/local/kkmail/data/mysql/default/ibdata1 /store # mv /usr/local/kkmail/data/mysql/default/ib_logfile0 /store # mv /usr/local/kkmail/data/mysql/default/ib_logfile1 /store
建立软链接
# ln -s /store/kkmail /usr/local/kkmail/data/mysql/default/kkmail # ln -s /store/ibdata1 /usr/local/kkmail/data/mysql/default/ibdata1 # ln -s /store/ib_logfile0 /usr/local/kkmail/data/mysql/default/ib_logfile0 # ln -s /store/ib_logfile1 /usr/local/kkmail/data/mysql/default/ib_logfile1
更正权限
# chown -R kkmail_mysql.kkmail_mysql /usr/local/kkmail/data/mysql/default/kkmail # chown -R kkmail_mysql.kkmail_mysql /usr/local/kkmail/data/mysql/default/ibdata1 # chown -R kkmail_mysql.kkmail_mysql /usr/local/kkmail/data/mysql/default/ib_logfile0 # chown -R kkmail_mysql.kkmail_mysql /usr/local/kkmail/data/mysql/default/ib_logfile1
3、在node2上面进行以下操作
修改原来的内容
# mv /usr/local/kkmail/data/mysql/default/kkmail{,_bak} # mv /usr/local/kkmail/data/mysql/default/ibdata1{,_bak} # mv /usr/local/kkmail/data/mysql/default/ib_logfile0{,_bak} # mv /usr/local/kkmail/data/mysql/default/ib_logfile1{,_bak}
# 建立软链接
# ln -s /store/kkmail /usr/local/kkmail/data/mysql/default/kkmail # ln -s /store/ibdata1 /usr/local/kkmail/data/mysql/default/ibdata1 # ln -s /store/ib_logfile0 /usr/local/kkmail/data/mysql/default/ib_logfile0 # ln -s /store/ib_logfile1 /usr/local/kkmail/data/mysql/default/ib_logfile1
备注:
ln -s 源地址 目标地址
软链接可以对一个不存在的文件名进行链接
软链接可以对目录进行链接
五、Hearbeat配置
1、编译安装heartbeat,centos7下没有heartbeat的yum源(node1,node2)
相关包下载地址:http://www.linux-ha.org/wiki/Downloads
安装基础环境
# yum install -y bzip2 autoconf automake libtool glib2-devel libxml2-devel bzip2-devel libtool-ltdl-devel asciidoc libuuid-devel psmisc
安装glue # wget http://hg.linux-ha.org/glue/archive/0a7add1d9996.tar.bz2 # tar jxvf 0a7add1d9996.tar.bz2 # cd Reusable-Cluster-Components-glue--0a7add1d9996/ # groupadd haclient # useradd -g haclient hacluster # ./autogen.sh # ./configure --prefix=/usr/local/heartbeat/ # make # make install 安装Resource Agents # wget https://github.com/ClusterLabs/resource-agents/archive/v3.9.6.tar.gz # tar zxvf v3.9.6.tar.gz # cd resource-agents-3.9.6/ # ./autogen.sh # export CFLAGS="$CFLAGS -I/usr/local/heartbeat/include -L/usr/local/heartbeat/lib" # ./configure --prefix=/usr/local/heartbeat/ # vi /etc/ld.so.conf.d/heartbeat.conf /usr/local/heartbeat/lib # ldconfig # make # make install 安装HeartBeat # wget http://hg.linux-ha.org/heartbeat-STABLE_3_0/archive/958e11be8686.tar.bz2 # tar jxvf 958e11be8686.tar.bz2 # cd Heartbeat-3-0-958e11be8686 # ./bootstrap # export CFLAGS="$CFLAGS -I/usr/local/heartbeat/include -L/usr/local/heartbeat/lib" # ./configure --prefix=/usr/local/heartbeat/ # vi /usr/local/heartbeat/include/heartbeat/glue_config.h /*define HA_HBCONF_DIR “/usr/local/heartbeat/etc/ha.d/”*/ (注意这行用/**/注释掉) # make # make install
2、复制配置文件
# cp /usr/local/heartbeat/share/doc/heartbeat/ha.cf /usr/local/heartbeat/etc/ha.d # cp /usr/local/heartbeat/share/doc/heartbeat/authkeys /usr/local/heartbeat/etc/ha.d # cp /usr/local/heartbeat/share/doc/heartbeat/haresources /usr/local/heartbeat/etc/ha.d
3、设置ha.cf配置文件
(node1)
编辑ha.cf,添加下面配置:
# vi /usr/local/heartbeat/etc/ha.d/ha.cf debugfile /var/log/ha-debug logfile /var/log/ha-log keepalive 2 warntime 10 deadtime 30 initdead 60 udpport 1112 bcast ens192 ucast ens192 192.168.1.54 #baud 19200 auto_failback off node drbd1.db.com node drbd2.db.com ping 192.168.1.3 respawn hacluster /usr/local/heartbeat/libexec/heartbeat/ipfail
(node2)
编辑ha.cf,添加下面配置:
# vi /usr/local/heartbeat/etc/ha.d/ha.cf debugfile /var/log/ha-debug logfile /var/log/ha-log keepalive 2 warntime 10 deadtime 30 initdead 60 udpport 1112 bcast ens192 ucast ens192 192.168.1.52 #baud 19200 auto_failback off node drbd1.db.com node drbd2.db.com ping 192.168.1.3 respawn hacluster /usr/local/heartbeat/libexec/heartbeat/ipfail
4、编辑双机互联验证文件authkeys,添加以下内容:(node1,node2)
# vi /usr/local/heartbeat/etc/ha.d/authkeys auth 1 1 crc
给验证文件600权限
# chmod 600 /usr/local/heartbeat/etc/ha.d/authkeys
5、编辑集群资源haresources文件
# vi /usr/local/heartbeat/etc/ha.d/haresources (node1) drbd1.db.com IPaddr::192.168.1.55/24/ens192 drbddisk::r0 Filesystem::/dev/drbd0::/store mysqld
(node2) drbd2.db.com IPaddr::192.168.1.55/24/ens192 drbddisk::r0 Filesystem::/dev/drbd0::/store mysqld
主机名是自己的,ip地址是双机热备虚拟IP地址。
注:该文件内IPaddr,Filesystem等脚本存放路径在/etc/ha.d/resource.d/下,也可在该目录下存放服务启动脚本(例如:mysql,www),将相同脚本名称添加到/etc/ha.d/haresources内容中,从而跟随heartbeat启动而启动该脚本。
IPaddr::192.168.1.55/24/ens192:用IPaddr脚本配置对外服务的浮动虚拟IP
drbddisk::r0:用drbddisk脚本实现DRBD主从节点资源组的挂载和卸载
Filesystem::/dev/drbd0::/store:用Filesystem脚本实现磁盘挂载和卸载
六、创建DRBD脚本文件drbddisk:(node1,node2)
编辑drbddisk,添加下面的脚本内容
# vi /usr/local/heartbeat/etc/ha.d/resource.d/drbddisk #!/bin/bash # # This script is inteded to be used as resource script by heartbeat # # Copright 2003-2008 LINBIT Information Technologies # Philipp Reisner, Lars Ellenberg # ### DEFAULTFILE="/etc/default/drbd" DRBDADM="/sbin/drbdadm" if [ -f $DEFAULTFILE ]; then . $DEFAULTFILE fi if [ "$#" -eq 2 ]; then RES="$1" CMD="$2" else RES="all" CMD="$1" fi ## EXIT CODES # since this is a "legacy heartbeat R1 resource agent" script, # exit codes actually do not matter that much as long as we conform to # http://wiki.linux-ha.org/HeartbeatResourceAgent # but it does not hurt to conform to lsb init-script exit codes, # where we can. # http://refspecs.linux-foundation.org/LSB_3.1.0/ #LSB-Core-generic/LSB-Core-generic/iniscrptact.html #### drbd_set_role_from_proc_drbd() { local out if ! test -e /proc/drbd; then ROLE="Unconfigured" return fi dev=$( $DRBDADM sh-dev $RES ) minor=${dev#/dev/drbd} if [[ $minor = *[!0-9]* ]] ; then # sh-minor is only supported since drbd 8.3.1 minor=$( $DRBDADM sh-minor $RES ) fi if [[ -z $minor ]] || [[ $minor = *[!0-9]* ]] ; then ROLE=Unknown return fi if out=$(sed -ne "/^ *$minor: cs:/ { s/:/ /g; p; q; }" /proc/drbd); then set -- $out ROLE=${5%/**} : ${ROLE:=Unconfigured} # if it does not show up else ROLE=Unknown fi } case "$CMD" in start) # try several times, in case heartbeat deadtime # was smaller than drbd ping time try=6 while true; do $DRBDADM primary $RES && break let "--try" || exit 1 # LSB generic error sleep 1 done ;; stop) # heartbeat (haresources mode) will retry failed stop # for a number of times in addition to this internal retry. try=3 while true; do $DRBDADM secondary $RES && break # We used to lie here, and pretend success for anything != 11, # to avoid the reboot on failed stop recovery for "simple # config errors" and such. But that is incorrect. # Don't lie to your cluster manager. # And don't do config errors... let --try || exit 1 # LSB generic error sleep 1 done ;; status) if [ "$RES" = "all" ]; then echo "A resource name is required for status inquiries." exit 10 fi ST=$( $DRBDADM role $RES ) ROLE=${ST%/**} case $ROLE in Primary|Secondary|Unconfigured) # expected ;; *) # unexpected. whatever... # If we are unsure about the state of a resource, we need to # report it as possibly running, so heartbeat can, after failed # stop, do a recovery by reboot. # drbdsetup may fail for obscure reasons, e.g. if /var/lock/ is # suddenly readonly. So we retry by parsing /proc/drbd. drbd_set_role_from_proc_drbd esac case $ROLE in Primary) echo "running (Primary)" exit 0 # LSB status "service is OK" ;; Secondary|Unconfigured) echo "stopped ($ROLE)" exit 3 # LSB status "service is not running" ;; *) # NOTE the "running" in below message. # this is a "heartbeat" resource script, # the exit code is _ignored_. echo "cannot determine status, may be running ($ROLE)" exit 4 # LSB status "service status is unknown" ;; esac ;; *) echo "Usage: drbddisk [resource] {start|stop|status}" exit 1 ;; esac exit 0
赋予755执行权限:
# chmod 755 /usr/local/heartbeat/etc/ha.d/resource.d/drbddisk
七、启动HeartBeat服务
在两个节点上启动HeartBeat服务,先启动node1,再启动node2:(node1,node2)
# systemctl start heartbeat # systemctl enable heartbeat # systemctl status heartbeat
如果启动失败,麻烦执行这两条命令后再启动:
# ln -svf /usr/local/heartbeat/lib64/heartbeat/plugins/RAExec/* /usr/local/heartbeat/lib/heartbeat/plugins/RAExec/ # ln -svf /usr/local/heartbeat/lib64/heartbeat/plugins/* /usr/local/heartbeat/lib/heartbeat/plugins/
八、测试双机热备(高可用)
测试之前建议开启防火墙firewalld,允许7788、1112端口(其中1112端口UDP端口)。
重启、关机或停止heartbeat服务。但是不要同时重启,要不两台服务器会同时挂载存储),node2节点会立即无缝接管。
注意:此时node2上的DRBD状态连接状态可能是WFConnection,等nod1开机后就会变成Connected,并且ro和ds也会显示Primary/Secondary UpToDate/UpToDate
# cat /proc/drbd version: 8.4.11-1 (api:1/proto:86-101) GIT-hash: 66145a308421e9c124ec391a7848ac20203bb03c build by mockbuild@, 2018-04-26 12:10:42 0: cs:WFConnection ro:Secondary/Unknown ds:UpToDate/DUnknown C r----- ns:0 nr:12 dw:12 dr:0 al:0 bm:0 lo:0 pe:0 ua:0 ap:0 ep:1 wo:f oos:0 # cat /proc/drbd version: 8.4.11-1 (api:1/proto:86-101) GIT-hash: 66145a308421e9c124ec391a7848ac20203bb03c build by mockbuild@, 2018-04-26 12:10:42 0: cs:Connected ro:Primary/Secondary ds:UpToDate/UpToDate C r----- ns:8 nr:12 dw:24 dr:6605 al:2 bm:0 lo:0 pe:0 ua:0 ap:0 ep:1 wo:f oos:0
备注:访问服务器要用漂移IP(VIP):192.168.1.55
九、日志和常见问题
1、重启或关机mail1,mail2服务器日志如下(正常切换的日志):
May 18 12:14:39 drbd1.db.com heartbeat: [1243]: info: Received shutdown notice from 'drbd2.db.com'. May 18 12:14:39 drbd1.db.com heartbeat: [1243]: info: Resources being acquired from drbd2.db.com. May 18 12:14:39 drbd1.db.com heartbeat: [1812]: info: acquire all HA resources (standby). ResourceManager(default)[1838]: 2018/05/18_12:14:39 info: Acquiring resource group: drbd1.db.com IPaddr::192.168.1.55/24/ens192 drbddisk::r0 Filesystem::/dev/drbd0::/store /usr/lib/ocf/resource.d//heartbeat/IPaddr(IPaddr_192.168.1.55)[1889]: 2018/05/18_12:14:39 INFO: Resource is stopped /usr/lib/ocf/resource.d//heartbeat/IPaddr(IPaddr_192.168.1.55)[1890]: 2018/05/18_12:14:39 INFO: Resource is stopped May 18 12:14:39 drbd1.db.com heartbeat: [1813]: info: Local Resource acquisition completed. ResourceManager(default)[1838]: 2018/05/18_12:14:39 info: Running /usr/local/heartbeat/etc/ha.d/resource.d/IPaddr 192.168.1.55/24/ens192 start IPaddr(IPaddr_192.168.1.55)[2029]: 2018/05/18_12:14:39 INFO: Using calculated netmask for 192.168.1.55: 255.255.255.0 IPaddr(IPaddr_192.168.1.55)[2029]: 2018/05/18_12:14:39 INFO: eval ifconfig ens192:0 192.168.1.55 netmask 255.255.255.0 broadcast 192.168.1.255 /usr/lib/ocf/resource.d//heartbeat/IPaddr(IPaddr_192.168.1.55)[2003]: 2018/05/18_12:14:39 INFO: Success ResourceManager(default)[1838]: 2018/05/18_12:14:39 info: Running /usr/local/heartbeat/etc/ha.d/resource.d/drbddisk r0 start /usr/lib/ocf/resource.d//heartbeat/Filesystem(Filesystem_/dev/drbd0)[2176]: 2018/05/18_12:14:39 INFO: Resource is stopped ResourceManager(default)[1838]: 2018/05/18_12:14:39 info: Running /usr/local/heartbeat/etc/ha.d/resource.d/Filesystem /dev/drbd0 /store start Filesystem(Filesystem_/dev/drbd0)[2260]: 2018/05/18_12:14:39 INFO: Running start for /dev/drbd0 on /store Filesystem(Filesystem_/dev/drbd0)[2260]: 2018/05/18_12:14:39 INFO: Starting filesystem check on /dev/drbd0 /usr/lib/ocf/resource.d//heartbeat/Filesystem(Filesystem_/dev/drbd0)[2250]: 2018/05/18_12:14:40 INFO: Success May 18 12:14:40 drbd1.db.com heartbeat: [1812]: info: all HA resource acquisition completed (standby). May 18 12:14:40 drbd1.db.com heartbeat: [1243]: info: Standby resource acquisition done [all]. harc(default)[2338]: 2018/05/18_12:14:40 info: Running /usr/local/heartbeat/etc/ha.d/rc.d/status status mach_down(default)[2355]: 2018/05/18_12:14:40 info: /usr/local/heartbeat/share/heartbeat/mach_down: nice_failback: foreign resources acquired mach_down(default)[2355]: 2018/05/18_12:14:40 info: mach_down takeover complete for node drbd2.db.com. May 18 12:14:40 drbd1.db.com heartbeat: [1243]: info: mach_down takeover complete. harc(default)[2391]: 2018/05/18_12:14:40 info: Running /usr/local/heartbeat/etc/ha.d/rc.d/ip-request-resp ip-request-resp ip-request-resp(default)[2391]: 2018/05/18_12:14:40 received ip-request-resp IPaddr::192.168.1.55/24/ens192 OK yes ResourceManager(default)[2414]: 2018/05/18_12:14:40 info: Acquiring resource group: drbd1.db.com IPaddr::192.168.1.55/24/ens192 drbddisk::r0 Filesystem::/dev/drbd0::/store /usr/lib/ocf/resource.d//heartbeat/IPaddr(IPaddr_192.168.1.55)[2442]: 2018/05/18_12:14:40 INFO: Running OK /usr/lib/ocf/resource.d//heartbeat/Filesystem(Filesystem_/dev/drbd0)[2519]: 2018/05/18_12:14:40 INFO: Running OK May 18 12:15:02 drbd1.db.com heartbeat: [1243]: info: Heartbeat restart on node drbd2.db.com May 18 12:15:02 drbd1.db.com heartbeat: [1243]: info: Status update for node drbd2.db.com: status init May 18 12:15:02 drbd1.db.com heartbeat: [1243]: info: Status update for node drbd2.db.com: status up May 18 12:15:02 drbd1.db.com ipfail: [1765]: info: Status update: Node drbd2.db.com now has status init May 18 12:15:02 drbd1.db.com ipfail: [1765]: info: Status update: Node drbd2.db.com now has status up harc(default)[2574]: 2018/05/18_12:15:02 info: Running /usr/local/heartbeat/etc/ha.d/rc.d/status status harc(default)[2591]: 2018/05/18_12:15:02 info: Running /usr/local/heartbeat/etc/ha.d/rc.d/status status May 18 12:15:04 drbd1.db.com heartbeat: [1243]: info: Status update for node drbd2.db.com: status active May 18 12:15:04 drbd1.db.com ipfail: [1765]: info: Status update: Node drbd2.db.com now has status active May 18 12:15:04 drbd1.db.com ipfail: [1765]: info: Asking other side for ping node count. harc(default)[2612]: 2018/05/18_12:15:04 info: Running /usr/local/heartbeat/etc/ha.d/rc.d/status status May 18 12:15:05 drbd1.db.com heartbeat: [1243]: info: remote resource transition completed. May 18 12:15:08 drbd1.db.com ipfail: [1765]: info: No giveup timer to abort.
2、常见错误与解决办法
这个错误安装yum install psmisc解决: ERROR: Setup problem: couldn't find command: fuser ERROR: Return code 5 from /usr/local/heartbeat/etc/ha.d/resource.d/Filesystem ERROR: Program is not installed 这个错误,执行这2条命令解决: # ln -svf /usr/local/heartbeat/lib64/heartbeat/plugins/RAExec/* /usr/local/heartbeat/lib/heartbeat/plugins/RAExec/ # ln -svf /usr/local/heartbeat/lib64/heartbeat/plugins/* /usr/local/heartbeat/lib/heartbeat/plugins/ ERROR: Illegal directive [bcast] in /usr/local/heartbeat/etc/ha.d/ha.cf ERROR: Illegal directive [ucast] in /usr/local/heartbeat/etc/ha.d/ha.cf ERROR: Illegal directive [ping] in /usr/local/heartbeat/etc/ha.d/ha.cf ERROR: Heartbeat not started: configuration error. ERROR: Configuration error, heartbeat not started. 这个错误:文件不存在或权限不对 # chmod 755 /usr/local/heartbeat/etc/ha.d/resource.d/drbddisk ERROR: Cannot locate resource script drbddisk ERROR: Cannot locate resource script drbddisk ERROR: Cannot locate resource script drbddisk info: Retrying failed stop operation [drbddisk::r0] ERROR: Resource script for drbddisk::r0 probably not LSB-compliant. 创建drbd分区报错,是提示需要一个干净的分区,不能格式化 [root@drbd1 ~]# drbdadm create-md r0 md_offset 1036286685184 al_offset 1036286652416 bm_offset 1036255027200 Found xfs filesystem 1011998720 kB data area apparently used 1011967800 kB left usable by current configuration Device size would be truncated, which would corrupt data and result in 'access beyond end of device' errors. You need to either * use external meta data (recommended) * shrink that filesystem first * zero out the device (destroy the filesystem) Operation refused. 编译安装drbd的时候遇到问题,好像解决不了,改用yum安装drbd In file included from /root/drbd-8.4.3/drbd/drbd_proc.c:34:0: /root/drbd-8.4.3/drbd/drbd_int.h:2515:0: warning: "idr_for_each_entry" redefined [enabled by default] #define idr_for_each_entry(idp, entry, id) ^ In file included from include/linux/kernfs.h:14:0, from include/linux/sysfs.h:15, from include/linux/kobject.h:21, from include/linux/module.h:16, from /root/drbd-8.4.3/drbd/drbd_proc.c:26: include/linux/idr.h:132:0: note: this is the location of the previous definition #define idr_for_each_entry(idp, entry, id) ^ /root/drbd-8.4.3/drbd/drbd_proc.c: In function ‘drbd_proc_open’: /root/drbd-8.4.3/drbd/drbd_proc.c:320:3: error: implicit declaration of function ‘PDE’ [-Werror=implicit-function-declaration] return single_open(file, drbd_seq_show, PDE(inode)->data); ^ /root/drbd-8.4.3/drbd/drbd_proc.c:320:53: error: invalid type argument of ‘->’ (have ‘int’) return single_open(file, drbd_seq_show, PDE(inode)->data); ^ cc1: some warnings being treated as errors make[3]: *** [/root/drbd-8.4.3/drbd/drbd_proc.o] Error 1 make[2]: *** [_module_/root/drbd-8.4.3/drbd] Error 2 make[2]: Leaving directory `/usr/src/kernels/3.10.0-862.2.3.el7.x86_64' make[1]: *** [kbuild] Error 2 make[1]: Leaving directory `/root/drbd-8.4.3/drbd' make: *** [module] Error 2
十、DRBD常见维护
1、注意监控
(1)监控heartbeat服务
(2)监控drbd服务和同步状态
(3)监控/store挂载情况(同一时间只能挂载一边)
(4)如果一台服务器的业务代码要升级,另外一台也升级(是放在/store目录外的数据)
2、服务器维护建议:
(1)不要同时重启两台服务器,否则可能会争夺资源(术语叫做脑裂),建议间隔5分钟左右。
(2)不要同时开机两台服务器,否则可能会争夺资源(术语叫做脑裂),建议间隔5分钟左右。
(3)当前心跳线是192.168.1.0网段的,建议后期在两台服务器上各加一个网卡,用网线直接将两台服务器相连(IP配置成另外一个网段)。这样可以避免由于您192.168.1.0网段出现故障造成争夺资源(术语叫做脑裂)。传输速度也更高。
3、怎么确认同步是否有问题:
最基本的方法,在两台服务器上运行df –h命令查看存储挂载情况:
正常情况:一台服务器挂载了,另外一台服务器没挂载,并且两边drbd都是启动的,并且cat /proc/drbd状态正常。
不正常情况1:如果两台服务器都挂载了,表示不正常,即发生了脑裂。这时候请联系技术支持解决。
不正常情况2:一台服务器挂载了,另外一台服务器没挂载,但是drdb服务停止状态,并且cat /proc/drbd状态不正常。
不正常情况下drbd状态一般为:
(1). 其中两个个节点的连接状态为 StandAlone
(2). 其中一个节点的连接状态为 WFConnection,另一个问题StandAlone
查看主备服务器DRBD状态:
cat /proc/drbd
4、DRBD同步异常的原因:
(1). 采用HA环境的时候自动切换导致脑裂;
(2). 人为操作或配置失误,导致产生的脑裂;
(3). 经验有限,惭愧的很,只碰到以上2中产生脑裂的原因。
(4). drbd服务停止了
5、使用过程中可能遇到的问题和解决方法:
一般问题状态可能是这样的:
备机(hlt1):
[root@hlt1 ~]# service drbd status drbd driver loaded OK; device status: version: 8.4.3 (api:1/proto:86-101) GIT-hash: 89a294209144b68adb3ee85a73221f964d3ee515 build by root@hlt1.holitech.net, 2016-10-31 10:43:50 m:res cs ro ds p mounted fstype 0:r0 WFConnection Secondary/Unknown UpToDate/DUnknown C
[root@hlt1 ~]# cat /proc/drbd version: 8.4.3 (api:1/proto:86-101) GIT-hash: 89a294209144b68adb3ee85a73221f964d3ee515 build by root@hlt1.holitech.net, 2016-10-31 10:43:50 0: cs:WFConnection ro:Secondary/Unknown ds:UpToDate/DUnknown C r----- ns:0 nr:0 dw:0 dr:0 al:0 bm:0 lo:0 pe:0 ua:0 ap:0 ep:1 wo:f oos:383860
主机(hlt2):
[root@hlt2 ~]# cat /proc/drbd version: 8.4.3 (api:1/proto:86-101) GIT-hash: 89a294209144b68adb3ee85a73221f964d3ee515 build by root@hlt2.holitech.net, 2016-10-31 10:49:30 0: cs:StandAlone ro:Primary/Unknown ds:UpToDate/DUnknown r----- ns:0 nr:0 dw:987208 dr:3426933 al:1388 bm:0 lo:0 pe:0 ua:0 ap:0 ep:1 wo:f oos:1380568204
[root@hlt2 ~]# service drbd status drbd driver loaded OK; device status: version: 8.4.3 (api:1/proto:86-101) GIT-hash: 89a294209144b68adb3ee85a73221f964d3ee515 build by root@hlt2.holitech.net, 2016-10-31 10:49:30 m:res cs ro ds p mounted fstype 0:r0 StandAlone Primary/Unknown UpToDate/DUnknown r----- ext4
(1)在备服务器操作:其中example(比如r0)是资源名。
[root@hlt1 ~]# drbdadm secondary r0 [root@hlt1 ~]# drbdadm --discard-my-data connect r0 (如果返回错误信息,就多执行一次)
(2)在主服务器操作:
[root@hlt2 ~]# drbdadm connect r0 [root@hlt2 ~]# cat /proc/drbd version: 8.4.4 (api:1/proto:86-101) GIT-hash: 599f286440bd633d15d5ff985204aff4bccffadd build by root@master.luodi.com, 2013-11-03 00:03:40 1: cs:SyncSource ro:Primary/Secondary ds:UpToDate/Inconsistent C r----- ns:6852 nr:0 dw:264460 dr:8393508 al:39 bm:512 lo:0 pe:2 ua:0 ap:0 ep:1 wo:d oos:257728 [>....................] sync'ed: 4.7% (257728/264412)K finish: 0:03:47 speed: 1,112 (1,112) K/sec
(3)备主机上查看:DRBD恢复正常:
备服务器:
[root@hlt1 ~]# cat /proc/drbd version: 8.4.3 (api:1/proto:86-101) GIT-hash: 89a294209144b68adb3ee85a73221f964d3ee515 build by root@hlt1.holitech.net, 2016-10-31 10:43:50 0: cs:Connected ro:Secondary/Primary ds:UpToDate/UpToDate C r----- ns:0 nr:1455736720 dw:1455736720 dr:0 al:0 bm:140049 lo:0 pe:0 ua:0 ap:0 ep:1 wo:f oos:0
主服务器:
[root@hlt2 ~]# cat /proc/drbd version: 8.4.3 (api:1/proto:86-101) GIT-hash: 89a294209144b68adb3ee85a73221f964d3ee515 build by root@hlt2.holitech.net, 2016-10-31 10:49:30 0: cs:Connected ro:Primary/Secondary ds:UpToDate/UpToDate C r----- ns:1455737960 nr:0 dw:85995012 dr:1403665281 al:113720 bm:139737 lo:0 pe:0 ua:0 ap:0 ep:1 wo:f oos:0
DRBD日常管理:
http://blog.163.com/qiushuhui1989@126/blog/static/27011089201561411536667/
http://blog.csdn.net/leshami/article/details/49777677
http://www.cnblogs.com/rainy-shurun/p/5335843.html
centos7.5部署heartbeat+DRBD+mysql高可用方案
标签:centos7.5部署drbd centos 7.5部署heartbe mysql高可用方案 heartbeat+DRBD+mysq
原文地址:http://blog.51cto.com/net881004/2117869