Heartbeat高可用部署（二）

时间：2016-05-23 14:54:48 阅读：222 评论：0 收藏：0 [点我收藏+]

标签：

三：Heartbeat高可用部署基础准备

3.1 搭建虚拟机模拟真实环境

我们安装前面的主机规划来进行配置主机

首先我们准备两台机器

技术分享

给虚拟机配置IP和主机名，hosts

按照主机规划给服务器配置IP地址，如果是双网卡的机器，要记得添加网卡设备，尽可能在关机状态下添加网卡设备，然后开机登录后，执行/etc/init.d/kudzu start（centos6已经没有这个命令，可以使用start_udev来管理）检查新硬件

技术分享

完成之后重启两台主机,然后通过setup配置

技术分享

注意：这里不用设置网关和DNS,重启下网络服务 service network restart，另外一台主机也按上述步骤进行操作

主机名，hosts我这边配置好了就不再多说，ping检查下

Bash

[root@node01 ~]# ping node02.cn
PING node02.cn (172.10.25.27) 56(84) bytes of data.
64 bytes from node02.cn (172.10.25.27): icmp_seq=1 ttl=64 time=0.543 ms
64 bytes from node02.cn (172.10.25.27): icmp_seq=2 ttl=64 time=0.519 ms
64 bytes from node02.cn (172.10.25.27): icmp_seq=3 ttl=64 time=0.515 ms

Bash

[root@node02 ~]# ping node01.cn
PING node01.cn (172.10.25.26) 56(84) bytes of data.
64 bytes from node01.cn (172.10.25.26): icmp_seq=1 ttl=64 time=2.10 ms
64 bytes from node01.cn (172.10.25.26): icmp_seq=2 ttl=64 time=0.646 ms
64 bytes from node01.cn (172.10.25.26): icmp_seq=3 ttl=64 time=0.465 ms

3.2 配置服务器间的心跳连接

在两台机器上分别增加一条主机路由，来实现两台机器检查对端时通过这个心跳线线路检查
node01上添加路由：

Bash

    /sbin/route add -host 10.25.25.17 dev eth1
    echo ‘/sbin/route add -host 10.25.25.17 dev eth1‘ >>/etc/rc.local

10.25.25.17 为node02 eth1 ip地址
node02上添加路由：

Bash

   /sbin/route add -host 10.25.25.16 dev eth1 
    echo ‘/sbin/route add -host 10.25.25.16 dev eth1‘ >>/etc/rc.local

10.25.25.16 为node01 eth1 ip地址

查看下node01的路由：

Bash

[root@node01 ~]# route -n
Kernel IP routing table
Destination     Gateway         Genmask         Flags Metric Ref    Use Iface
10.25.25.17     0.0.0.0         255.255.255.255 UH    0      0        0 eth1
172.10.25.0     0.0.0.0         255.255.255.0   U     0      0        0 eth0
10.10.25.0      0.0.0.0         255.255.255.0   U     0      0        0 eth1
169.254.0.0     0.0.0.0         255.255.0.0     U     1002   0        0 eth0
169.254.0.0     0.0.0.0         255.255.0.0     U     1003   0        0 eth1
0.0.0.0         172.10.25.2     0.0.0.0         UG    0      0        0 eth0

四：Heartbeat高可用部署

4.1 在Centos5.X中安装Heartbeat2

yum install heartbeat -y 需要执行两遍
注意：heartbeat属于不直接对外服务的软件，没有特殊的性能需求，所以该类软件一般使用yum安装效果更好，部署简单、快速，维护容易

4.2 在Centos6.X中安装heartbeat3

配置epel源

Bash

    [root@node01 ~]# rpm -Uvh https://dl.fedoraproject.org/pub/epel/6/x86_64/epel-release-6-8.noarch.rpm
    [root@node02 ~]# rpm -Uvh https://dl.fedoraproject.org/pub/epel/6/x86_64/epel-release-6-8.noarch.rpm

安装heartbeat

Bash

    [root@node01 ~]# yum install heartbeat* -y
    [root@node02 ~]# yum install heartbeat* -y

   如果想将安装的软件包缓存下来，按如下操作方法
   [root@node01 ~]# sed -i ‘s#keepcache=0#keepcache=1#g‘ /etc/yum.conf
   [root@node01 ~]# grep keepcache /etc/yum.conf
   keepcache=1 启用保留缓存

配置ha.cf文件

Bash

    [root@node01 ~]# ls /usr/share/doc/heartbeat-3.0.4/   配置文件模板目录
    apphbd.cf  AUTHORS    COPYING       ha.cf        README
    [root@node01 heartbeat-3.0.4]# cp ha.cf authkeys haresources /etc/ha.d/  拷贝配置模板

ha.cf配置参数说明

[root@node01 ~]# vim /etc/ha.d/ha.cf
debugfile /var/log/ha-debug   heartbeat的调试日志存放位置
logfile        /var/log/ha-log   heartbeat的日志存放位置
logfacility     local0   在syslog服务中配置通过local0设备接收日志
keepalive 2   指定心跳间隔时间为2秒（即每2秒在eth1上发一次广播）
deadtime 30   指定备用节点在30秒内没有接收到主节点的心跳信号，则立即接管主节点的服务资源
warntime 10   指定心跳延迟的时间为10秒，当10秒钟内备份节点不能接收到主节点的心跳信号时，就会往日志中写入一个警告日志但不会切换服务
initdead 120   指定在heartbeat首次运行后，需要等待120秒才启动主服务器的资源。该选项用于解决这种情况产生的时间间隔，取值至少为deadtime的两倍。单机启动时会遇到VIP绑定很慢，为正常现象，该值设置的长的原因
#bcast   eth1    指定心跳使用以太网广播方式在eth1接口上进行广播，如使用两个实际网络来传输心跳则bcast eth0 eth1，这里我们采用多播方式就不启用它
mcast eth1 225.0.0.10 694 1 0   如果采用组播通讯，在这里可以设置组播通讯所使用的接口，绑定的组播ip地址(在224.0.0.0 - 239.255.255.255间)，通讯端口，ttl(time to live)所能经过路由的跳数，是否允许环回(也就是本地发出的数据包时候还接收)
#ucast eth0 192.168.1.2 如果采用单播，那么可以配置其网络接口以及所使用的ip地址
auto_failback on 用于决定，当拥有该资源的属主恢复之后，资源是否变迁：是迁移到属主上，还是在当前节点上继续运行，直到当前节点出现故障
#stonith baytech /etc/ha.d/conf/stonith.baytech 用于共享资源的集群环境中，采用stonith防御技术来保证数据的一致性
#watchdog /dev/watchdog   该指令是用于设置看门狗定时器，如果节点一分钟内都没有心跳，那么节点将重新启动
#node   ken3   设置集群中的节点，注意：节点名必须与uname –n相匹配,也可以是IP地址
node    node01.cn
node    node02.cn

所需要配置的参数：

Bash

debugfile /var/log/ha-debug
logfile /var/log/ha-log
logfacility     local0
keepalive 2
deadtime 30
warntime 10
initdead 60
mcast eth1 225.0.0.10 694 1 0
auto_failback on
node    node01.cn
node    node02.cn

配置authkeys文件

Bash

[root@node01 ~]# echo 123456 | openssl sha1
(stdin)= c4f9375f9834b4e7f0a528cc65c055702bf5f24a  生成一个哈希值

Bash

[root@node01 ~]# vim /etc/ha.d/authkeys 
#       Authentication file.  Must be mode 600   authy权限必须为600
#       Available methods: crc sha1, md5.  Crc doesn‘t need/want a key.     加密算法，crc加密不安全，sha1加密最好
auth 1
1 sha1 c4f9375f9834b4e7f0a528cc65c055702bf5f24a  采用sha1认证
[root@node01 ~]# chmod 600 /etc/ha.d/authkeys

提示：两台服务器都需要配置

配置haresource文件

编辑配置heartbeat资源文件/etc/ha.d/haresource

Bash

    [root@node01 ~]# vim /etc/ha.d/haresources 
     45 node01.cn IPaddr::172.10.25.18/24/eth0
     46 node02.cn IPaddr::172.10.25.10/24/eth0

说明：
node01.cn   为主机名，表示初始状态会在node01.cn绑定ip 172.10.25.18
IPaddr      为heartbeat配置IP的默认脚本，其后的IP等都是脚本的参数
172.10.25.18/24/eth0   为集群对外服务的VIP,初始启动在node01.cn上，24为子网掩码，eth0为IP绑定的实际物理网卡，为heartbeat提供对外服务通信接口

另外的配置说明：Heartbeat+Drbd+MySQL

Bash

node01.cn IPaddr::172.10.25.18/24/eth0 drbddisk::data Filesystem::/dev/drbd0::/data::ext3 mysqld

drbddisk::data 启动drbddata资源，这里相当执行了/etc/ha.d/resource.d/drbddisk data start/stop
Filesystem::/dev/drbd0::/data::ext3 drbd分区挂载到/data目录，这里相当执行了 /etc/ha.d/resource.d/Filesystem /dev/drbd0 /data ext3 start/stop
mysqld 启动mysql启动脚本，必须在/etc/init.d下面

我们把配置好的3个文件直接拷到另一台服务器上

Bash

[root@node01 ~]# cd /etc/ha.d/
[root@node01 ha.d]# scp ha.cf authkeys haresources 172.10.25.27:/etc/ha.d/

到这里也基本部署的差不多了

五：检测Heartbeat高可用

5.1 启动heartbeat服务

Bash

[root@node01 ~]# /etc/init.d/heartbeat start  启动服务
[root@node01 ~]# ip add | grep 172.10*
    inet 172.10.25.26/24 brd 172.10.25.255 scope global eth0
    inet 172.10.25.10/24 brd 172.10.25.255 scope global secondary eth0
    inet 172.10.25.18/24 brd 172.10.25.255 scope global secondary eth0  因为对端服务没有开启所以由当前主机接管资源，所以有2个VIP
[root@node01 ~]# ps -ef | grep heartbeat
root      14637      1  0 15:26 ?        00:00:01 heartbeat: master control process
root      14643  14637  0 15:26 ?        00:00:00 heartbeat: FIFO reader        
root      14644  14637  0 15:26 ?        00:00:00 heartbeat: write: mcast eth1  
root      14645  14637  0 15:26 ?        00:00:00 heartbeat: read: mcast eth1   
root      15476  13298  0 16:17 pts/3    00:00:00 grep heartbeat
[root@node01 ~]# tail -f /var/log/ha-debug 查看调试日志

Bash

[root@node02 ~]# /etc/init.d/heartbeat start
[root@node02 ~]# ip add | grep 172.10*
    inet 172.10.25.27/24 brd 172.10.25.255 scope global eth0
    inet 172.10.25.18/24 brd 172.10.25.255 scope global secondary eth0
    inet 172.10.25.10/24 brd 172.10.25.255 scope global secondary eth0

两台主机都出现两个VIP，这就是脑裂。

5.2 发生脑裂故障排查

看能否互相ping通

Bash

[root@node01 ~]# ping node02.cn
PING node02.cn (172.10.25.27) 56(84) bytes of data.
64 bytes from node02.cn (172.10.25.27): icmp_seq=1 ttl=64 time=0.302 ms
64 bytes from node02.cn (172.10.25.27): icmp_seq=2 ttl=64 time=0.554 ms

Bash

[root@node02 ~]# ping node02.cn
PING node02.cn (172.10.25.27) 56(84) bytes of data.
64 bytes from node02.cn (172.10.25.27): icmp_seq=1 ttl=64 time=0.141 ms
64 bytes from node02.cn (172.10.25.27): icmp_seq=2 ttl=64 time=0.057 ms

可以互相PING通

查看是否是防火墙iptables影响，如果是iptables影响可以关闭iptables服务或694端口通过

Bash

[root@node01 ~]# service iptables stop
[root@node02 ~]# service iptables stop
[root@node01 ~]# service heartbeat stop
[root@node01 ~]# service heartbeat start
[root@node02 ~]# service heartbeat stop
[root@node02 ~]# service heartbeat start
[root@node01 ~]# ip add | grep 172.10*
    inet 172.10.25.26/24 brd 172.10.25.255 scope global eth0
    inet 172.10.25.18/24 brd 172.10.25.255 scope global secondary eth0
[root@node02 ~]# ip add | grep 172.10*
    inet 172.10.25.27/24 brd 172.10.25.255 scope global eth0
    inet 172.10.25.10/24 brd 172.10.25.255 scope global secondary eth0

查看heartbeat集群心跳信息

Bash

[root@node01 ~]# cl_status listhblinks node01.cn  查看节点所使用的心跳
        eth1
[root@node01 ~]# cl_status listhblinks node02.cn
        eth1
[root@node01 ~]# cl_status hblinkstatus node02.cn eth1  查看节点node01.cn的eth1心跳状态
up
[root@node02 ~]# cl_status hblinkstatus node01.cn eth1
up

5.3 heartbeat资源手动切换与故障恢复

手动切换我们得模拟故障，常见故障有网卡损坏或关闭网络、系统宕机、heartbeat服务停止、使用脚本hb_standby

Bash

[root@node01 ~]# /usr/share/heartbeat/hb_standby   完全释放
Going standby [all].
[root@node01 ~]# ip addr | grep 172.10* 
    inet 172.10.25.26/24 brd 172.10.25.255 scope global eth0
[root@node01 ~]# /usr/share/heartbeat/hb_takeover  完全接管
[root@node01 ~]# ip addr | grep 172.10*           
    inet 172.10.25.26/24 brd 172.10.25.255 scope global eth0
    inet 172.10.25.18/24 brd 172.10.25.255 scope global secondary eth0
    inet 172.10.25.10/24 brd 172.10.25.255 scope global secondary eth0

我们这里就模拟heartbeat服务挂了的情况

Bash

[root@node01 ~]# /etc/init.d/heartbeat stop
[root@node02 ~]# ip add | grep 172.10*                马上就接管了
    inet 172.10.25.27/24 brd 172.10.25.255 scope global eth0   
    inet 172.10.25.10/24 brd 172.10.25.255 scope global secondary eth0    
    inet 172.10.25.18/24 brd 172.10.25.255 scope global secondary eth0
[root@node01 ~]# /etc/init.d/heartbeat start  再恢复
[root@node01 ~]# ip add | grep 172.10*                资源也接收回来了    
    inet 172.10.25.26/24 brd 172.10.25.255 scope global eth0    
    inet 172.10.25.18/24 brd 172.10.25.255 scope global secondary eth0
[root@node02 ~]# ip add | grep 172.10*   
    inet 172.10.25.27/24 brd 172.10.25.255 scope global eth0    
    inet 172.10.25.10/24 brd 172.10.25.255 scope global secondary eth0

5.4 通过heartbeat日志分析资源接管过程

我们先停止服务，清空日志

Bash

[root@node01 ~]# /etc/init.d/heartbeat stop
[root@node02 ~]# /etc/init.d/heartbeat stop
[root@node01 ~]# >/var/log/ha-log 
[root@node01 ~]# >/var/log/ha-debug 
[root@node02 ~]# >/var/log/ha-log 
[root@node02 ~]# >/var/log/ha-debug

再另外开个终端查看日志动态

Bash

[root@node01 ~]# tail -f /var/log/ha-debug
[root@node01 ~]# /etc/init.d/heartbeat start
[root@node01 ~]# tail -f /var/log/ha-debug
Feb 27 15:29:07 node01.cn heartbeat: [11460]: info: Pacemaker support: false
Feb 27 15:29:07 node01.cn heartbeat: [11460]: WARN: Logging daemon is disabled --enabling logging daemon is recommended
Feb 27 15:29:07 node01.cn heartbeat: [11460]: info: **************************
Feb 27 15:29:07 node01.cn heartbeat: [11460]: info: Configuration validated. Starting heartbeat 3.0.4
Feb 27 15:29:07 node01.cn heartbeat: [11461]: info: heartbeat: version 3.0.4
Feb 27 15:29:07 node01.cn heartbeat: [11461]: info: Heartbeat generation: 1456298790
Feb 27 15:29:07 node01.cn heartbeat: [11461]: info: glib: UDP multicast heartbeat started for group 225.0.0.10 port 694 interface eth1 (ttl=1 loop=0)
Feb 27 15:29:07 node01.cn heartbeat: [11461]: info: G_main_add_TriggerHandler: Added signal manual handler
Feb 27 15:29:07 node01.cn heartbeat: [11461]: info: G_main_add_TriggerHandler: Added signal manual handler
Feb 27 15:29:07 node01.cn heartbeat: [11461]: info: G_main_add_SignalHandler: Added signal handler for signal 17
Feb 27 15:29:08 node01.cn heartbeat: [11461]: info: Local status now set to: ‘up‘  要启动一段时间
Feb 27 15:29:07 node01.cn heartbeat: [11461]: info: G_main_add_SignalHandler: Added signal handler for signal 17
Feb 27 15:29:08 node01.cn heartbeat: [11461]: info: Local status now set to: ‘up‘
Feb 27 15:30:08 node01.cn heartbeat: [11461]: WARN: node node02.cn: is dead    node02.cn没开启服务显示挂掉了
启动node02.cn heartbeat后
Feb 27 15:33:19 node01.cn heartbeat: [11461]: info: Link node02.cn:eth1 up.
Feb 27 15:33:19 node01.cn heartbeat: [11461]: info: Status update for node node02.cn: status init
Feb 27 15:33:19 node01.cn heartbeat: [11461]: info: Status update for node node02.cn: status up
Feb 27 15:33:19 node01.cn heartbeat: [11461]: debug: StartNextRemoteRscReq(): child count 1
Feb 27 15:33:19 node01.cn heartbeat: [12151]: debug: notify_world: setting SIGCHLD Handler to SIG_DFL
harc(default)[12151]:   2016/02/27_15:33:19 info: Running /etc/ha.d//rc.d/status status
Feb 27 15:33:19 node01.cn heartbeat: [12169]: debug: notify_world: setting SIGCHLD Handler to SIG_DFL
harc(default)[12169]:   2016/02/27_15:33:19 info: Running /etc/ha.d//rc.d/status status
Feb 27 15:33:20 node01.cn heartbeat: [11461]: debug: get_delnodelist: delnodelist= 
Feb 27 15:33:20 node01.cn heartbeat: [11461]: info: Status update for node node02.cn: status active
Feb 27 15:33:20 node01.cn heartbeat: [12186]: debug: notify_world: setting SIGCHLD Handler to SIG_DFL
harc(default)[12186]:   2016/02/27_15:33:21 info: Running /etc/ha.d//rc.d/status status
Feb 27 15:33:21 node01.cn heartbeat: [11461]: info: remote resource transition completed.
Feb 27 15:33:21 node01.cn heartbeat: [11461]: info: node01.cn wants to go standby [foreign]
Feb 27 15:33:21 node01.cn heartbeat: [11461]: info: standby: node02.cn can take our foreign resources
Feb 27 15:33:21 node01.cn heartbeat: [12203]: info: give up foreign HA resources (standby).
ResourceManager(default)[12216]:        2016/02/27_15:33:22 info: Releasing resource group: node02.cn IPaddr::172.10.25.10/24/eth0
ResourceManager(default)[12216]:        2016/02/27_15:33:22 info: Running /etc/ha.d/resource.d/IPaddr 172.10.25.10/24/eth0 stop
IPaddr(IPaddr_172.10.25.10)[12279]:     2016/02/27_15:33:22 INFO: IP status = ok, IP_CIP=
/usr/lib/ocf/resource.d//heartbeat/IPaddr(IPaddr_172.10.25.10)[12253]:  2016/02/27_15:33:22 INFO:  Success
INFO:  Success
Feb 27 15:33:22 node01.cn heartbeat: [12203]: info: foreign HA resource release completed (standby).
Feb 27 15:33:22 node01.cn heartbeat: [11461]: info: Local standby process completed [foreign].
Feb 27 15:33:22 node01.cn heartbeat: [11461]: WARN: 1 lost packet(s) for [node02.cn] [10:12]
Feb 27 15:33:22 node01.cn heartbeat: [11461]: info: remote resource transition completed.
Feb 27 15:33:22 node01.cn heartbeat: [11461]: info: No pkts missing from node02.cn!
Feb 27 15:33:22 node01.cn heartbeat: [11461]: info: Other node completed standby takeover of foreign resources

Heartbeat高可用部署（二）

标签：

原文地址：http://www.cnblogs.com/ylion/p/5519653.html

踩

(0)

评论一句话评论（0）

分享档案

更多>

2021年07月29日 (22)
2021年07月28日 (40)
2021年07月27日 (32)
2021年07月26日 (79)
2021年07月23日 (29)
2021年07月22日 (30)
2021年07月21日 (42)
2021年07月20日 (16)
2021年07月19日 (90)
2021年07月16日 (35)

周排行