标签:高可用集群之corosync+pacemaker及用crm命令和nfs-server构建一个ha高可用集群
红帽5.0使用的是OpenAIS作为内核中的信息通信API,然后借助CMAN作为Messager Layer,再使用ramanager作为CRM进行资源的管理
Corosync具有比heartbeat在设计上更好的信息通信机制
红帽6.0直接使用Corosync用来作为集群的Messager Layer
不同的公司的API机制所调用的库,函数类型,返回方式各不相同,这就必须使用一个标准,使不同公司的API保持最大的兼容
比如你买了华硕的主板使用其他公司的鼠标照样可以使用
应用接口规范(AIS)就是用来定义应用程序接口(API)的开放性规范的集合,这些应用程序作为中间件作为应用服务提供了一种开放,高移植性的程序接口,使用AIS的应用程序接口API,减少了应用程序的复杂性和开放时间
OpenAIS组件:CLM CKPT EVT LCK MSG......
OpenAIS的版本:Picacho Whitetank Wilson 其中Wilson是最新的
Corosync是OpenAIS发展到Wilson版本后独立出来的开放性集群引擎工程
OpenAIS从0.9开始分为wilson和Corosync
Corosync本身只是一个集群引擎,用来处理集群的事物信息传递,也就是用来作为Mssager Layer,而Corosync并不具备集群资源的管理功能,其CRM必须有pacemaker扮演提供资源管理pacemaker是由heartbeat V3独立出去的项目,并且Pacemaker独立后的开发着重点也是Corosync而不是heartbeat V3
Corosync可以完全使用命令来进行集群资源的配置,但也有许多图形化工具
corosync是高可用集群的底层信息传递层, 主要负责与上层交互并完成心跳和上层所要发送的事务信息。还有,为了防止发生Split brain以后所带来的问题,还有法定票数(quorum)这一概念。这里所要安装的是1.4版本的,负责集群票数的统计,每个节点一张票,到了2.*版本以后有了投票的功能,可以设定某节点可以持有多少张票。 最后完成票数的统计并交于CRM层来决策节点集群是否还要运行。 更多概念朋友们自己去查吧, 我自己对这方面了解的也少。而且我打字真的很慢。
pacemaker是高可用集群中的CRM(Cluster Resource Manager)资源管理层,它是一个服务,可以做为一个单独的服务启动, 不过在我们使用corosync-1.4的版本中,可以设置为corosync来启动pacemaker.
pacemaker的配置接口可以在任意节点上安装crmsh或者pcs还有一些GUI界面的软件来完成。crmsh好像在RrdHat6.4以后都不是官方自带的了,官方的是pcs。 而crmsh好像是OpenSUSE所开发的。
Corosync的官网www.corosync.org
OPenAIS的官网www.openais.org
Pacemaker官网www.clusterlabs.org
所以集群的Messager Layer与CRM 组合如下:
1 haresource + heartbeat v1/v2
2 crm + heartbeat v2
3 pacemaker + corosync
4 pacemaker + heartbeat v3
5 cman + ragmanager
今天将使用Pacemaker + Corosync用来定义并管理一个集群服务
可以用rpm装 也可以进行源码编译,也可以用yum直接装
________________________________________________________________________________________________________
192.168.139.2
[root@www ~]# ntpdate cn.ntp.org.cn \\ntp同步时间,我找的是中国区的一个全球ntp-server
[root@www .ssh]# ssh-keygen -t rsa -P ‘‘ //做ssh双机互信
[root@www .ssh]# ssh-copy-id -i ./id_rsa.pub root@192.168.139.4
[root@www html]# uname -n \\本节点名称
[root@www mysql]# yum install corosync pacemaker \\直接yum安装
________________________________________________________________________________________________________
192.168.139.4
[root@www ~]# ntpdate cn.ntp.org.cn
[root@www .ssh]# ssh-keygen -t rsa -P ‘‘
[root@www .ssh]# ssh-copy-id -i ./id_rsa.pub root@192.168.139.2
[root@www html]# uname -n
www.rs2.com
[root@www mysql]# yum install corosync pacemaker
Installed:
corosync.x86_64 0:1.4.7-5.el6 pacemaker.x86_64 0:1.1.14-8.el6_8.1
Dependency Installed:
clusterlib.x86_64 0:3.0.12.1-78.el6 corosynclib.x86_64 0:1.4.7-5.el6 libibverbs.x86_64 0:1.1.8-4.el6 libqb.x86_64 0:0.17.1-2.el6 librdmacm.x86_64 0:1.0.21-0.el6 lm_sensors-libs.x86_64 0:3.1.1-17.el6
net-snmp-libs.x86_64 1:5.5-57.el6_8.1 pacemaker-cli.x86_64 0:1.1.14-8.el6_8.1
pacemaker-cluster-libs.x86_64 0:1.1.14-8.el6_8.1
pacemaker-libs.x86_64 0:1.1.14-8.el6_8.1 pciutils.x86_64 0:3.1.10-4.el6 rdma.noarch 0:6.8_4.1-1.el6
[root@www mysql]# rpm -ql corosync
/etc/corosync //此目录下有Corosync的配置文件
/etc/corosync/corosync.conf.example //Corosync的配置文件样例
/usr/sbin/corosync-keygen //可以用此命令生成秘钥
[root@www mysql]# cd /etc/corosync
[root@www corosync]# ll
total 16
-rw-r--r--. 1 root root 2663 May 11 2016 corosync.conf.example
[root@www corosync]# cp corosync.conf.example corosync.conf
[root@www corosync]# vim corosync.conf
# Please read the corosync.conf.5 manual page
compatibility: whitetank
totem {
version: 2 //配置文件版本号
secauth: off //开启安全认证功能,安全的认证,当使用aisexec时,会非常消耗CPU
threads: 0 //线程数,根据CPU个数和核心数确定,secauth为off时无意义
interface {
ringnumber: 0 //冗余环号,防止多播环路定义每个节点的环号,每个节点 //一个网卡就不用指,默认为0
bindnetaddr: 192.168.139.0 //网卡的网络地址不是IP地址
mcastaddr: 239.255.1.1 //心跳信息传递的组播地址
mcastport: 5405 //组播使用的端口
ttl: 1 //
}
}
logging {
fileline: off //指定要打印的行
to_stderr: no //错误信息的是否发到标准错误前段,建议不开启
to_logfile: yes //定义是否记录到日志文件
logfile: /var/log/cluster/corosync.log //定义独立日志文件的位置,此目录要自己创 //建
to_syslog: no //定义是否记录到syslog,和to_logfile只启用一个即可
debug: off //是否开启debug功能
timestamp: on //是否打印时间戳,利于错误定位,但每次记录都要通过系统调用获取时 //间,消耗CPU
logger_subsys {
subsys: AMF //是否记录AMF子系统的信息,没有启用OpenAIS,则不用启用
debug: off
}
}
amf {
mode: disabled //与编程相关的,可以不设置
}
server {
ver: 0
name: pacemaker //启动pacemaker
}
aisexec { //这项可以不用加
user: root
group: root
}
___________________________________________________________________________________________
[root@www ~]# corosync-keygen //生成通信密钥,并保存在/etc/corosync/authkey
Writing corosync key to /etc/corosync/authkey
[root@www cluster]# corosync-keygen //
由于要使用/dev/random生成随机数,因此如果新装的系统操作不多,如果没有足够的熵,可能会出现如下的提示.................... 一定要在本地乱敲键盘,ssh登录的好像没有用
Gathering 1024 bits for key from //random.
Press keys on your keyboard to generate entropy.
Press keys on your keyboard to generate entropy (bits = 240).
[root@www ~]# cd /etc/corosync/
[root@www cluster]#scp /etc/corosync/corosync.conf 192.168.139.2:/etc/corosync/ //将文件复制到另一个节点
[root@www ~]# service corosync start //开启本节点的corosync
[root@www ~]# ssh 192.168.139.2 service corosync start //开启另一个节点的corosync
__________________________________________________________________________________________
//看启动中是否出现错误,网上搜了也不知道为啥,但我仍然顺利完成了整个实验,看来不是什么大错误
[root@www cluster]# grep ERROR: /var/log/cluster/corosync.log
Nov 11 15:05:10 www corosync[3470]: [pcmk ] ERROR: process_ais_conf: You have configured a cluster using the Pacemaker plugin for Corosync. The plugin is not supported in this environment and will be removed very soon.
Nov 11 15:05:10 www corosync[3470]: [pcmk ] ERROR: process_ais_conf: Please see Chapter 8 of ‘Clusters from Scratch‘ (http://www.clusterlabs.org/doc) for details on using Pacemaker with CMAN
__________________________________________________________________________________________
[root@www ~]# grep -e "Corosync Cluster Engine" -e "configuration file" /var/log/cluster/corosync.log //查看corosync引擎是否启动正常
Nov 11 16:34:19 corosync [MAIN ] Corosync Cluster Engine (‘1.4.7‘): started and ready to provide service.
Nov 11 16:34:19 corosync [MAIN ] Successfully read main configuration file ‘/etc/corosync/corosync.conf‘.
Nov 11 16:34:19 [1908] www.rs2.com cib: info: retrieveCib:Reading cluster configuration file /var/lib/pacemaker/cib/cib.xml (digest: /var/lib/pacemaker/cib/cib.xml.sig)
Nov 11 16:34:19 [1908] www.rs2.com cib: info: cib_file_write_with_digest:Reading cluster configuration file /var/lib/pacemaker/cib/cib.DU5D4x (digest: /var/lib/pacemaker/cib/cib.zBJmL2)
__________________________________________________________________________________________
[root@www ~]# grep TOTEM /var/log/cluster/corosync.log //查看初始化成员节点通知是否正常
Nov 11 16:34:07 corosync [TOTEM ] Initializing transport (UDP/IP Multicast).
Nov 11 16:34:07 corosync [TOTEM ] Initializing transmit/receive security: libtomcrypt SOBER128/SHA1HMAC (mode 0).
Nov 11 16:34:08 corosync [TOTEM ] The network interface [192.168.139.4] is now up.
Nov 11 16:34:08 corosync [TOTEM ] A processor joined or left the membership and a new membership was formed.
__________________________________________________________________________________________
[root@www ~]# grep error /var/log/cluster/corosync.log //看启动中是否出现错误.主要是没有 //配置STONISH设备,可以忽略的错误,最后用crm命令 prorerty stonith-enabled=false 便可禁用
Nov 11 16:34:32 [2174] www.rs2.com pengine: error: unpack_resources:Resource start-up disabled since no STONITH resources have been defined
Nov 11 16:34:32 [2174] www.rs2.com pengine: error: unpack_resources:Either configure some or disable STONITH with the stonith-enabled option
Nov 11 16:34:32 [2174] www.rs2.com pengine: error: unpack_resources:NOTE: Clusters with shared data need STONITH to ensure data integrity
___________________________________________________________________________________________
[root@www ~]# grep pcmk_startup /var/log/cluster/corosync.log //查看pacemaker是否正常 //启动
Nov 11 16:34:08 corosync [pcmk ] info: pcmk_startup: CRM: Initialized
Nov 11 16:34:08 corosync [pcmk ] Logging: Initialized pcmk_startup
Nov 11 16:34:08 corosync [pcmk ] info: pcmk_startup: Maximum core file size is: 18446744073709551615
Nov 11 16:34:08 corosync [pcmk ] info: pcmk_startup: Service: 9
Nov 11 16:34:08 corosync [pcmk ] info: pcmk_startup: Local hostname:www.rs2.com
___________________________________________________________________________________________
[root@www ~]# crm_mon \\可以用来监控集群的当前状态
Last updated: Fri Nov 11 16:19:10 2016 Last change: Fri Nov 11 16:10:18 2016 by hacluster via crmd on www.rs2.com
Stack: classic openais (with plugin)
Current DC: www.rs2.com (version 1.1.14-8.el6_8.1-70404b0) - partition WITHOUT quorum
2 nodes and 0 resources configured, 2 expected votes
//两个节点,0个资源,但不知道为什么rs1 为UNCLEAN (offline)
Node www.rs1.com: UNCLEAN (offline)
Online: [ www.rs2.com ]
//将一切停掉,重新生成了一个corosync配置文件后再此启动又变好了
[root@www .ssh]# crm_mon
Last updated: Fri Oct 28 21:29:51 2016 Last change: Fri Nov 11 22:33:32 2016 by hacluster via crmd on www.rs1.com
Stack: classic openais (with plugin)
Current DC: www.rs1.com (version 1.1.14-8.el6_8.1-70404b0) - partition with quorum
2 nodes and 0 resources configured, 2 expected votes
Online: [ www.rs1.com www.rs2.com ] //两个节点正常
__________________________________________________________________________________________
用crm命令配置集群的资源
[root@www ~]# crm
-bash: crm: command not found
[root@www ~]# rpm -qa pacemaker //pacemaker为1.1.14
pacemaker-1.1.14-8.el6_8.1.x86_64
从pacemaker 1.1.8开始,crm发展成了一个独立项目,叫crmsh。也就是说,我们安装了pacemaker后,并没有crm这个命令,我们要实现对集群资源管理,还需要独立安装crmsh。crmsh的rpm安装可从如下地址下载:
http://download.opensuse.org/repositories/network:/ha-clustering:/Stable/CentOS_CentOS-6/x86_64/
crmsh依赖于许多包如:pssh,因此也需要通过上面地址下载pssh.rpm 上面链接还可以下载corosync和pacemaker但我用的是yum直接装的
http://download.opensuse.org/repositories/network:/ha-clustering:/Stable/CentOS_CentOS-6/x86_64/
或者直接下载openSUSE的ha集群yum源直接安装
[root@www tool]# wget http://download.opensuse.org/repositories/network:/ha-clustering:/Stable/CentOS_CentOS-6/network:ha-clustering:Stable.repo
就一个yum库:
[network_ha-clustering_Stable] name=Stable High Availability/Clustering packages (CentOS_CentOS-6) type=rpm-md baseurl=http://download.opensuse.org/repositories/network:/ha-clustering:/Stable/CentOS_CentOS-6/ gpgcheck=1 gpgkey=http://download.opensuse.org/repositories/network:/ha-clustering:/Stable/CentOS_CentOS-6//repodata/repomd.xml.key enabled=1
[root@www tool]# mv network\:ha-clustering\:Stable.repo /etc/yum.repos.d/
[root@www yum.repos.d]# ll //这是我主机上的所有yum源
total 52
-rw-r--r--. 1 root root CentOS-Base.repo
-rw-r--r--. 1 root root CentOS-Debuginfo.repo
-rw-r--r--. 1 root root 2015 CentOS-fasttrack.repo
-rw-r--r--. 1 root root 2015 CentOS-Media.repo
-rw-r--r--. 1 root root 2015 CentOS-Vault.repo
-rw-r--r--. 1 root root 2014 elrepo.repo
-rw-r--r--. 1 root root 2012 epel.repo
-rw-r--r--. 1 root roo 2012 epel-testing.repo
-rw-r--r--. 1 root root network:ha-clustering:Stable.repo
-rw-r--r--. 1 root root openSUSE-13.2-NonFree-Update.repo.back
-rw-r--r--. 1 root root openSUSE-Leap-42.1-Update.repo.bak
-rw-r--r--. 1 root root zxl.repo
[root@www tool]# yum install crmsh //直接yum安装
http://www.111cn.net/sys/linux/73074.htm 网上找到的很详细的一篇关于crm命令使用
[root@www tool]# crm
crm(live)# help //获取帮助
cib //cib管理模块
resource //资源管理模块
configure //crm配置,包括资源的粘性,资源的类型,资源的约束等
node //集群节点管理子命令
options //用户优先级
history //crm命令的历史
site //地理集群支持
ra //管理资源代理
status //查看集群的状态
help,? //查看帮助
end.cd.up //返回上一级
quit,bye,exit //退出crm
crm(live)# cd resource
crm(live)resource# help
.........................
.........................
crm(live)resource# cd
crm(live)# configure //进入配置模式
crm(live)configure# show //查看集群的当前配置
node www.rs1.com
node www.rs2.com
property cib-bootstrap-options: \
dc-version=1.1.14-8.el6_8.1-70404b0 \
cluster-infrastructure="classic openais (with plugin)" \
expected-quorum-votes=2
crm(live)configure# verify //查看配置语法,因为没有安装STONITH设备,所以报错
ERROR: error: unpack_resources:Resource start-up disabled since no STONITH resources have been defined
error: unpack_resources:Either configure some or disable STONITH with the stonith-enabled option
error: unpack_resources:NOTE: Clusters with shared data need STONITH to ensure data integrity
Errors found during check: config not valid
crm(live)configure# property stonith-enabled=false //禁用STONISH设备
crm(live)configure# show
node www.rs1.com
node www.rs2.com
property cib-bootstrap-options: \
dc-version=1.1.14-8.el6_8.1-70404b0 \
cluster-infrastructure="classic openais (with plugin)" \
expected-quorum-votes=2 \
stonith-enabled=flase
crm(live)configure# verify //继续检查,不再报错误
crm(live)configure# commit //提交让配置生效
crm(live)configure# cd
crm(live)# ra
crm(live)ra# help
Resource Agents (RA) lists and documentation
Commands:
classes //查看RA类型和提供商
info //查看RA的详细信息
list //查看某一个类别下某个提供商所提供的所有RA
providers //查看指定资源的提供商和类型
validate //
meta //显示一个RA的源信息
cd //返回上一层
help
ls
quit
up //返回上一层
如何获取一个命令的详细信息?
crm(live)ra# help list //获取list命令的详细使用信息
List RA for a class (and provider)
List available resource agents for the given class. If the class
is ocf, supply a provider to get agents which are available
only from that provider.
Usage:
list <class> [<provider>]
Example:
list ocf pacemaker
crm(live)ra# classes //查看RA类型
lsb //lsb类别
ocf / heartbeat pacemaker //ocf 有两个提供商heartbeat和pacemaker
service
stonith //stonith类别
crm(live)ra# list ocf pacemaker //显示ocf类型下由pacemaker提供的所有RA
ClusterMon Dummy HealthCPU HealthSMART Stateful SysInfo SystemHealth controld ping pingd remote
crm(live)ra# list lsb //显示所有lsb类型所提供的所有RA
auditd blk-availability corosync corosync-notifyd crond halt heartbeat htcacheclean
crm(live)ra# help meta //meta用来显示一个RA的源信息
Usage:
info [<class>:[<provider>:]]<type> 哪一个类型:哪一个提供商:哪一个资源代理(RA)
info <type> <class> [<provider>] (obsolete)
如:
info apache
info ocf:pacemaker:Dummy //ocf类型:pacemaker所提供的:Dummy为资源代理
info stonith:ipmilan
info pengine
crm(live)ra# meta ocf:heartbeat:IPaddr //查看ocf类别由heartbeat提供资源代理微IPaddr的源信息
Parameters (*: required, []: default): //带*的为必须的,[ ]为默认的
ip* (string): IPv4 or IPv6 address //ip必须有
The IPv4 (dotted quad notation) or IPv6 address (colon hexadecimal notation)
example IPv4 "192.168.1.1".
example IPv6 "2001:db8:DC28:0:0:FC57:D4C8:1FFF".
nic (string): Network interface
......................
........................
Operations‘ defaults (advisory minimum) //对资源来说,建议的监控最小默认值
start timeout=20s //启动资源时最多等待20秒
stop timeout=20s //停止资源时最多等待20秒
status timeout=20s interval=10s
monitor timeout=20s interval=10s //每隔10秒检测一次,若梅检测到等待20秒,否 则资源转移
如何得知一个RA是有谁提供的?
在ra子模式下用providers命令可以如?
crm(live)ra# providers IPaddr //查看IPaddr这个资源的提供商,有heartbeat提供
heartbeat
___________________________________________________________________________________________
配置资源
crm(live)ra# cd
crm(live)# configure
crm(live)configure# primitive webip ocf:heartbeat:IPaddr params ip=192.168.139.10 nic=eth0 cidr_netmask=24
primitive定义主资源 webip为资源名称 ocf资源类别:heartbeat为provider:IPaddr为RA
params指定参数 ip 192.168.139.10(必须有) nic=eth0 (默认就是eth0)cidr_netmask=24 (掩码24)
crm(live)configure# show
node www.rs1.com
node www.rs2.com
primitive webip IPaddr \
params ip=192.168.139.10 nic=eth0 cidr_netmask=24
property cib-bootstrap-options: \
dc-version=1.1.14-8.el6_8.1-70404b0 \
cluster-infrastructure="classic openais (with plugin)" \
expected-quorum-votes=2 \
stonith-enabled=false
crm(live)configure# verify //看有没有错误
crm(live)configure# commit //无错误后提交
crm(live)configure# show xml //也可以查看xml格式的配置,更加详细
<?xml version="1.0" ?>
<cib num_updates="2" dc-uuid="www.rs1.com" update-origin="www.rs2.com" crm_feature_set="3.0.10" validate-with="pacemaker-2.4" update-client="cibadmin" epoch="5" admin_epoch="0" update-user="root" cib-last-written="Fri Nov 11 22:42:08 2016" have-quorum="1">
<configuration>
<crm_config>
<cluster_property_set id="cib-bootstrap-options">
<nvpair id="cib-bootstrap-options-dc-version" name="dc-version" value="1.1.14-8.el6_8.1-70404b0"/>
<nvpair id="cib-bootstrap-options-cluster-infrastructure" name="cluster-infrastructure" value="classic openais (with plugin)"/>
<nvpair id="cib-bootstrap-options-expected-quorum-votes" name="expected-quorum-votes" value="2"/>
<nvpair name="stonith-enabled" value="false" id="cib-bootstrap-options-stonith-enabled"/>
</cluster_property_set>
</crm_config>
<nodes>
<node id="www.rs1.com" uname="www.rs1.com"/>
<node id="www.rs2.com" uname="www.rs2.com"/>
</nodes>
<resources>
<primitive id="webip" class="ocf" provider="heartbeat" type="IPaddr">
<instance_attributes id="webip-instance_attributes">
<nvpair name="ip" value="192.168.139.10" id="webip-instance_attributes-ip"/>
<nvpair name="nic" value="eth0" id="webip-instance_attributes-nic"/>
<nvpair name="cidr_netmask" value="24" id="webip-instance_attributes-cidr_netmask"/>
</instance_attributes>
</primitive>
</resources>
<constraints/>
</configuration>
</cib>
crm(live)configure# cd
crm(live)#
crm(live)# status //此时资源其实已经开始运行,查看资源运行情况
Online: [ www.rs1.com www.rs2.com ]
Full list of resources:
webip(ocf::heartbeat:IPaddr):Started www.rs1.com \\可以看到rs1被选为了DC,资源webip运行 \\在www.rs1.com上
___________________________________________________________________________________________
192.168.139.2
[root@www corosync]# ip addr show //可以看到VIP192.168.139.10在eth0:0上
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP qlen 1000
inet 192.168.139.2/24 brd 192.168.139.255 scope global eth0
inet 192.168.139.10/24 brd 192.168.139.255 scope global secondary eth0
[root@www .ssh]# crm
crm(live)# resource
crm(live)resource# stop webip //停止webip资源
crm(live)resource# list
webip(ocf::heartbeat:IPaddr):(target-role:Stopped) Stopped
crm(live)resource# start webip
crm(live)resource# list
webip(ocf::heartbeat:IPaddr):Started
crm(live)resource# migrate webip //有风险实验迁移资源报错,用强制方法后webip资源启动不了,只能重启corosync
ERROR: resource.move: No target node: Move requires either a target node or ‘force‘
用status,可以看到如下错误
* webip_start_0 on www.rs2.com ‘not configured‘ (6): call=12, status=complete, exitreason=‘none‘,
last-rc-change=‘Sat Oct 29 08:55:24 2016‘, queued=1ms, exec=250ms
最后发现我rs2主机是克隆的,上面没有eth0网卡,只有eth1,而webip是定义在eth0上的(^_^)最后将eth1网卡改为了eth0,然后重启操作系统好了,以下是一个改网卡名称的文章
http://www.linuxidc.com/Linux/2015-06/118969.htm
在定义一个httpd资源
_____________________________________________________________
192.168.139.4
[root@www corosync]# rpm -qa httpd //本机无httpd
[root@www corosync]# yum install httpd //直接yum装
[root@www html]# vim index.html
<h1>www.RS2.com</h1>
[root@www html]# service httpd stop
Stopping httpd: [ OK ]
[root@www html]# chkconfig httpd off //集群资源千万别让开机自启动
___________________________________________________________________________________________
192.168.139.2
[root@www corosync]# rpm -qa httpd //本机无httpd
[root@www corosync]# yum install httpd //直接yum装
[root@www html]# vim index.html \\编辑httpd主页面,以区别不同的主机
<h1>www.RS1.com</h1>
[root@www html]# service httpd stop
Stopping httpd: [ OK ]
[root@www html]# chkconfig httpd off \\集群资源千万不能开机自启动
___________________________________________________________________________________________
192.168.139.4
[root@www corosync]# rpm -qa httpd //本机无httpd
[root@www corosync]# yum install httpd //直接yum装
[root@www html]# vim index.html
<h1>www.RS2.com</h1>
[root@www html]# service httpd stop
Stopping httpd: [ OK ]
[root@www html]# chkconfig httpd off
___________________________________________________________________________________________
192.168.139.2
[root@www ~]# crm
crm(live)# cd resource
crm(live)resource# list
webip(ocf::heartbeat:IPaddr):Started
crm(live)resource# cd ..
crm(live)# cd ra
crm(live)ra# providers httpd //可以看到httpd无提供商
crm(live)ra# list lsb //httpd这个ra属于ocf类别
auditd blk-availability corosync corosync-notifyd crond halt htcacheclean httpd
crm(live)ra# meta lsb:httpd //且用meta可以看到无其他参数,只有一些Operation
start and stop Apache HTTP Server (lsb:httpd)
server implementing the current HTTP standards.
Operations‘ defaults (advisory minimum):
start timeout=15
stop timeout=15
status timeout=15
restart timeout=15
force-reload timeout=15
monitor timeout=15 interval=15
crm(live)ra# cd
crm(live)# configure
crm(live)configure# primitive httpd lsb:httpd op start timeout=20 \\定义httpd主资源
crm(live)configure# show
node www.rs1.com
node www.rs2.com
primitive httpd lsb:httpd \
op start timeout=20 interval=0
primitive webip IPaddr \
params ip=192.168.139.10 nic=eth0 cidr_netmask=24 \
meta target-role=Started
property cib-bootstrap-options: \
dc-version=1.1.14-8.el6_8.1-70404b0 \
cluster-infrastructure="classic openais (with plugin)" \
expected-quorum-votes=2 \
stonith-enabled=false
crm(live)configure# verify
crm(live)configure# commit
crm(live)configure# cd
crm(live)# status
Last updated: Sat Oct 29 10:39:04 2016Last change: Sat Oct 29 08:33:08 2016 by root via cibadmin on www.rs1.com
Stack: classic openais (with plugin)
Current DC: www.rs2.com (version 1.1.14-8.el6_8.1-70404b0) - partition with quorum
2 nodes and 2 resources configured, 2 expected votes
Online: [ www.rs1.com www.rs2.com ]
Full list of resources: \\可以看到webip运行在rs1,而httpd运行在rs2
webip(ocf::heartbeat:IPaddr):Started www.rs1.com
httpd(lsb:httpd):Started www.rs2.com
___________________________________________________________________________________________
192.168.139.4
[root@www ~]# netstat -tnlp |grep httpd
tcp 0 0 :::80 LISTEN 1718/httpd
浏览器访问192.168.139.4
___________________________________________________________________________________________
192.168.139.2
将两个资源定义为一个组,让一起运行在同一个节点
crm(live)configure# help group \\不懂就help
Define a group
Usage:
group <name> <rsc> [<rsc>...]
\\group 组名 资源1 资源2 还可以描述组description,定义组的params,及meta属性,组的params有哪些要查官方文档
[description=<description>] \\描述
[meta attr_list] \\meta属性
[params attr_list] \\组的params
attr_list :: [$id=<id>] <attr>=<val> [<attr>=<val>...] | $id-ref=<id>
Example:
group internal_www disk0 fs0 internal_ip apache \
meta target_role=stopped
group vm-and-services vm vm-sshd meta container="vm" \\vm-and-service 组名 vm 资源1 vm-sshd 资源2 meta container="vm" meta属性
crm(live)configure# group webserver webip httpd \\webserver 组名 webip httpd为组中的两个资 \\源
crm(live)configure# verify
crm(live)configure# commit
crm(live)configure# show
node www.rs1.com
node www.rs2.com
primitive httpd lsb:httpd \
op start timeout=20 interval=0
primitive webip IPaddr \
params ip=192.168.139.10 nic=eth0 cidr_netmask=24 \
meta target-role=Started
group webserver webip httpd
property cib-bootstrap-options: \
dc-version=1.1.14-8.el6_8.1-70404b0 \
cluster-infrastructure="classic openais (with plugin)" \
expected-quorum-votes=2 \
crm(live)configure# cd
crm(live)# status
cibadmin on www.rs1.com
Stack: classic openais (with plugin)
Current DC: www.rs2.com (version 1.1.14-8.el6_8.1-70404b0) - partition with quorum
2 nodes and 2 resources configured, 2 expected votes
Online: [ www.rs1.com www.rs2.com ]
Full list of resources:
Resource Group: webserver \\资源组webserver定以后,两个资源会运行在一个节点上
webip(ocf::heartbeat:IPaddr):Started www.rs1.com
httpd(lsb:httpd):Started www.rs1.com
浏览器测试192.168.139.10
crm(live)# node
crm(live)node# standby \\让rs1成为备用节点,资源转移到rs2上
crm(live)node# cd
crm(live)# status \\资源成功从rs1转移到了rs2
Last updated: Sat Oct 29 11:32:08 2016Last change: Sat Oct 29 11:31:51 2016 by root via crm_attribute on www.rs1.com
Stack: classic openais (with plugin)
Current DC: www.rs1.com (version 1.1.14-8.el6_8.1-70404b0) - partition with quorum
*{这里为什么不是without qurum,难道standby后还可以投票?}
2 nodes and 2 resources configured, 2 expected votes
Node www.rs1.com: standby
Online: [ www.rs2.com ]
Full list of resources: \\并且rs1被standby后资源照样运行正常,应该是只剩下rs2后票数只有一票
Resource Group: webserver \\票数只有一票,没有超过一半,资源被stop
webip(ocf::heartbeat:IPaddr):Started www.rs2.com
httpd(lsb:httpd):Started www.rs2.com
crm(live)# node
crm(live)node# online \\让重新上线
crm(live)node# cd
crm(live)# status
Current DC: www.rs1.com (version 1.1.14-8.el6_8.1-70404b0) - partition with quorum
2 nodes and 2 resources configured, 2 expected votes
Online: [ www.rs1.com www.rs2.com ]
Full list of resources:
Resource Group: webserver \\重新上线,票数够了,资源又启动
webip(ocf::heartbeat:IPaddr):Started www.rs2.com
httpd(lsb:httpd):Started www.rs2.com
这次直接让rs2停掉
192.168.139.4
[root@www ~]# service corosync stop
Signaling Corosync Cluster Engine (corosync) to terminate: [ OK ]
Waiting for corosync services to unload:. [ OK ]
192.168.139.2
crm(live)# status
Last updated: Sat Oct 29 11:53:25 2016Last change: Sat Oct 29 11:52:39 2016 by root via crm_attribute on www.rs1.com
Stack: classic openais (with plugin)
Current DC: www.rs1.com (version 1.1.14-8.el6_8.1-70404b0) - partition WITHOUT quorum
{这次是without quorum 没有达到法定票数,看来只有停掉服务才不能投票,standby后仍然可以}
2 nodes and 2 resources configured, 2 expected votes
Online: [ www.rs1.com ]
OFFLINE: [ www.rs2.com ]
Full list of resources:
Resource Group: webserver \\票数没有到法定票数,默认会stop资源
webip(ocf::heartbeat:IPaddr):Stopped
httpd(lsb:httpd):Stopped
192.168.139.4
[root@www ~]# service corosync start
192.168.139.2
crm(live)# status
Last updated: Sat Oct 29 11:59:36 2016Last change: Sat Oct 29 11:52:39 2016 by root via crm_attribute on www.rs1.com
Stack: classic openais (with plugin)
Current DC: www.rs1.com (version 1.1.14-8.el6_8.1-70404b0) - partition with quorum
2 nodes and 2 resources configured, 2 expected votes
Online: [ www.rs1.com www.rs2.com ]
Full list of resources: \\rs2启动后,资源又启动了
Resource Group: webserver
webip(ocf::heartbeat:IPaddr):Started www.rs1.com
httpd(lsb:httpd):Started www.rs1.com
将不够法定票数时的默认操作改为ignore
crm(live)# configure
crm(live)configure# property no-quorum-policy=ignore
crm(live)configure# show
node www.rs1.com \
attributes standby=off
node www.rs2.com \
attributes standby=off
primitive httpd lsb:httpd \
op start timeout=20 interval=0
primitive webip IPaddr \
params ip=192.168.139.10 nic=eth0 cidr_netmask=24 \
meta target-role=Started
group webserver webip httpd
property cib-bootstrap-options: \
dc-version=1.1.14-8.el6_8.1-70404b0 \
cluster-infrastructure="classic openais (with plugin)" \
expected-quorum-votes=2 \
stonith-enabled=false \
no-quorum-policy=ignore
crm(live)configure# verify
crm(live)configure# commit
192.168.139.4
[root@www ~]# service corosync stop
192.168.139.2
crm(live)# status
Last updated: Sat Oct 29 12:03:53 2016Last change: Sat Oct 29 12:03:25 2016 by root via cibadmin on www.rs1.com
Stack: classic openais (with plugin)
Current DC: www.rs1.com (version 1.1.14-8.el6_8.1-70404b0) - partition WITHOUT quorum
{without quorum 不够法定票数}
2 nodes and 2 resources configured, 2 expected votes
Online: [ www.rs1.com ]
OFFLINE: [ www.rs2.com ]
Full list of resources: \\但是服务照样运行,因为ignore
Resource Group: webserver
webip(ocf::heartbeat:IPaddr):Started www.rs1.com
httpd(lsb:httpd):Started www.rs1.com
192.168.139.4
[root@www ~]# service corosync start
[root@www ~]# crm
crm(live)# node
crm(live)node# standby
crm(live)node# cd
crm(live)# status
Last updated: Sat Oct 29 10:03:51 2016Last change: Sat Oct 29 10:03:46 2016 by root via crm_attribute on www.rs1.com
Stack: classic openais (with plugin)
Current DC: www.rs1.com (version 1.1.14-8.el6_8.1-70404b0) - partition with quorum
{此处仍然够票数,看来standby后仍然可以投票是对的}
2 nodes and 2 resources configured, 2 expected votes
Node www.rs2.com: standby
Online: [ www.rs1.com ]
Full list of resources:
Resource Group: webserver \\已经为ignore,票数够不够资源都运行
webip(ocf::heartbeat:IPaddr):Started www.rs1.com
httpd(lsb:httpd):Started www.rs1.com
crm(live)# node
crm(live)node# online
不用定义组直接用约束,让资源在一起运行
crm(live)# resource
crm(live)resource# stop webserver
crm(live)resource# cleanup webserver
crm(live)resource# cd
crm(live)# configure
crm(live)configure# delete webserver
crm(live)configure# show
node www.rs1.com \
attributes standby=off
node www.rs2.com \
attributes standby=off
primitive httpd lsb:httpd \
op start timeout=20 interval=0
primitive webip IPaddr \
params ip=192.168.139.10 nic=eth0 cidr_netmask=24 \
meta target-role=Started
property cib-bootstrap-options: \
dc-version=1.1.14-8.el6_8.1-70404b0 \
cluster-infrastructure="classic openais (with plugin)" \
expected-quorum-votes=2 \
stonith-enabled=false \
no-quorum-policy=ignore \
last-lrm-refresh=1477714758
crm(live)configure# verify
crm(live)configure# commit
crm(live)# status
Online: [ www.rs1.com www.rs2.com ]
Full list of resources: \\可以看到两个资源又运行在不同节点上了
webip(ocf::heartbeat:IPaddr):Started www.rs1.com
httpd(lsb:httpd):Started www.rs2.com
定义colocation(资源与资源是否能运行在同一个节点,inf表示无穷大)
crm(live)# configure
crm(live)configure# colocation webip_with_httpd inf: webip httpd \\定义排列约束,约束两个资源
crm(live)configure# show
.........
colocation webip_with_httpd inf: webip httpd \\好像定义反了,这是httpd在哪,webip在哪;应该改为webip在哪,httpd在哪,谁在后谁做主
crm(live)configure# edit \\直接用edit编辑改
colocation webip_with_httpd inf: webip httpd
改为
colocation webip_with_httpd inf: httpd webip
crm(live)configure# show xml
<rsc_colocation id="webip_with_httpd" score="INFINITY" rsc="httpd" with-rsc="webip"/>
crm(live)configure# commit
crm(live)configure# cd
crm(live)# status
.........
Online: [ www.rs1.com www.rs2.com ]
Full list of resources: \\两个资源又运行在了一个节点上
webip(ocf::heartbeat:IPaddr):Started www.rs1.com
httpd(lsb:httpd):Started www.rs1.com
这样就用 colocation 排列约束将两个资源绑定了,资源启动也有先后顺序,定义Order顺序约束
crm(live)# configure
crm(live)configure# help order
Usage:
order <id> [{kind|<score>}:] first then [symmetrical=<bool>]
order <id> [{kind|<score>}:] resource_sets [symmetrical=<bool>]
kind :: Mandatory | Optional | Serialize 强制的|随意的|连续
first :: <rsc>[:<action>] \\资源后还可以定义action,将一个资源启动后采取什么操作在启动另一个,这些操作在resource下如start stop promote......
then :: <rsc>[:<action>]
resource_sets :: resource_set [resource_set ...]
crm(live)configure# order webip_before_httpd mandatory: webip httpd \\webip_before_httpd 是id mandatory 是kind,还可以是score: webip先启动 httpd后启动
crm(live)configure# commit
crm(live)configure# show xml
<rsc_colocation id="webip_with_httpd" score="INFINITY" rsc="httpd" with-rsc="webip"/>
<rsc_order id="webip_before_httpd" kind="Mandatory" first="webip" then="httpd"/>
first webip,then httpd
crm(live)configure# cd
crm(live)# status
Online: [ www.rs1.com www.rs2.com ]
Full list of resources: \\当前在rs1上运行
webip(ocf::heartbeat:IPaddr):Started www.rs1.com
httpd(lsb:httpd):Started www.rs1.com
crm(live)# node
crm(live)node# standby \\让rs1变为standby
crm(live)node# cd
crm(live)# status
Node www.rs1.com: standby
Online: [ www.rs2.com ]
Full list of resources: \\切换太快,没看出谁先启动的(^_^),反正资源转移了
webip(ocf::heartbeat:IPaddr):Started www.rs2.com
httpd(lsb:httpd):Started www.rs2.com
crm(live)# node
crm(live)node# online \\让rs1再上线
crm(live)node# cd
crm(live)# status
Online: [ www.rs1.com www.rs2.com ]
Full list of resources: \\但是资源没有回来
webip(ocf::heartbeat:IPaddr):Started www.rs2.com
httpd(lsb:httpd):Started www.rs2.com
如果想让上线后资源又转移回来怎么办?
定义location,位置约束(资源倾向运行在哪个节点)
crm(live)# configure
crm(live)configure# help location
Usage:
location <id> <rsc> [<attributes>] {<node_pref>|<rules>}
........
node_pref :: <score>: <node>
rules :: \\规则可以用表达式定义
rule [id_spec] [$role=<role>] <score>: <expression>
[rule [id_spec] [$role=<role>] <score>: <expression> ...]
location conn_1 internal_www \ conn_1 是id/名称 internal_www 是资源名
rule 50: #uname eq node1 \ 规则为 当uname等于node1时分数为50
crm(live)configure# location wibip_on_rs1 webip rule 100: #uname eq www.rs1.com
\\当uname等于www.rs1.com时location的分数为100
crm(live)configure# verify
crm(live)configure# commit
crm(live)configure# show xml
<rsc_location id="wibip_on_rs1" rsc="webip">
<rule score="100" id="wibip_on_rs1-rule">
<expression attribute="#uname" operation="eq" value="www.rs1.com" id="wibip_on_rs1-rule-expression"/>
crm(live)configure# cd
crm(live)# status
Online: [ www.rs1.com www.rs2.com ]
Full list of resources: \\location已经生效所以资源自动转移到了rs1
webip(ocf::heartbeat:IPaddr):Started www.rs1.com
httpd(lsb:httpd):Started www.rs1.com
crm(live)# node
crm(live)node# standby \\rs1转为备节点
crm(live)node# cd
crm(live)# status
Node www.rs1.com: standby
Online: [ www.rs2.com ]
Full list of resources:\\资源转移到了rs2
webip(ocf::heartbeat:IPaddr):Started www.rs2.com
httpd(lsb:httpd):Started www.rs2.com
crm(live)# node
crm(live)node# online
crm(live)node# cd
crm(live)# status
Online: [ www.rs1.com www.rs2.com ]
Full list of resources: \\rs1上线后资源从rs2转移回来了
webip(ocf::heartbeat:IPaddr):Started www.rs1.com
httpd(lsb:httpd):Started www.rs1.com
为资源定义粘性(资源是否倾向运行在当前节点)
crm(live)# configure
crm(live)configure# rsc_defaults resource-stickiness=200 \\定义资源的粘性为200
crm(live)configure# verify
crm(live)configure# commit
crm(live)configure# show xml
<meta_attributes id="rsc-options">
<nvpair name="resource-stickiness" value="200" id="rsc-options-resource-stickiness"/>
</meta_attributes>
crm(live)configure# cd
crm(live)# node standby
crm(live)# status
Node www.rs1.com: standby
Online: [ www.rs2.com ]
Full list of resources: \\资源转移到了rs2
webip(ocf::heartbeat:IPaddr):Started www.rs2.com
httpd(lsb:httpd):Started www.rs2.com
crm(live)# node online \\重新上线
crm(live)# status
Online: [ www.rs1.com www.rs2.com ]
Full list of resources: \\因为粘性stickiness(200)大于倾向性location(100),所以资源不会 \\再转移回rs1
webip(ocf::heartbeat:IPaddr):Started www.rs2.com
httpd(lsb:httpd):Started www.rs2.com
再加一个FileSystem,及192.168.139.8 NFS-Server,共享一个主页面让无论哪个节点运行资源,其通过浏览器访问的页面相同
_____________________________________________________________
192.168.139.8
[root@www ~]# vim /etc/exports
/web/htdocs 192.168.139.0/24 (ro)
[root@www local]# cd /web/htdocs/
[root@www htdocs]# vim index.html
<h1>www.NFS.com</h1>
[root@www ~]# service iptables stop
[root@www ~]# service nfs start
___________________________________________________________________________________________
192.168.139.4
root@www ~]# mount 192.168.139.8:/web/htdocs /mnt
[root@www ~]# cd /mnt
[root@www mnt]# ll
total 4
-rw-r--r--. 1 nobody nobody 21 Nov 12 2016 index.html
[root@www mnt]# cd
[root@www ~]# umount /mnt/
[root@www ~]# crm
crm(live)# ra
crm(live)ra# list ocf \\Filesystem属于ocf类别
Filesystem HealthCPU HealthSMART IPaddr
crm(live)ra# providers Filesystem \\Filesystem由heartbeat提供
heartbeat
crm(live)ra# meta ocf:heartbeat:Filesystem
device* (string): block device \\ddevice必须有
The name of block device for the filesystem, or -U, -L options for mount, or NFS mount specification.
directory* (string): mount point \\挂载点必须有
The mount point for the filesystem.
fstype* (string): filesystem type \\文件系统必须有
The type of filesystem to be mounted.
options (string): \\-o 指定挂载时的操作
Any extra options to be given as -o options to mount.
For bind mounts, add "bind" here and set fstype to "none".
We will do the right thing for options such as "bind,ro".
crm(live)ra# cd
crm(live)# configure
crm(live)configure# primitive nfs ocf:heartbeat:Filesystem params device=192.168.139.8:/web/htdocs/ directory=/var/www/html/ fstype=nfs op monitor timeout=60s
crm(live)configure# verify
crm(live)configure# commit
crm(live)configure# show
primitive nfs Filesystem \
params device="192.168.139.8:/web/htdocs/" directory="/var/www/html/" fstype=nfs \
primitive webip IPaddr \
params ip=192.168.139.10 nic=eth0 cidr_netmask=24 \
order webip_before_httpd Mandatory: webip httpd
colocation webip_with_httpd inf: httpd webip
location wibip_on_rs1 webip \
rule 100: #uname eq www.rs1.com \
expected-quorum-votes=2 \
stonith-enabled=false \
no-quorum-policy=ignore \
last-lrm-refresh=1477714758
rsc_defaults rsc-options: \
resource-stickiness=200
crm(live)configure# cd
crm(live)# status
Online: [ www.rs1.com www.rs2.com ]
Full list of resources: \\可以看到三个资源都启动了,webip和httpd在一起都运行在rs2上,而nfs \\运行在rs1上,并且
webip(ocf::heartbeat:IPaddr):Started www.rs2.com
httpd(lsb:httpd):Started www.rs2.com
nfs(ocf::heartbeat:Filesystem):Started www.rs1.com
___________________________________________________________________________________________
192.168.139.2
[root@www ~]# cd /var/www/html/
[root@www html]# ll
total 4
-rw-r--r--. 1 nobody nobody 21 Nov 12 2016 index.html
[root@www html]# vim index.html
<h1>www.NFS.com</h1> \\NFS共享的页面已经挂载了
如何让三个资源运行在一个节点上?
为Filestytem定义location和order
crm(live)configure# colocation nfs_with_webip inf: nfs webip \\nfs跟随webip,webip在哪nfs \\在哪
crm(live)configure# order webip_before_nfs mandatory: webip nfs \\先启动webip,再启动nfs
crm(live)configure# verify
crm(live)configure# commit
crm(live)configure# show
primitive nfs Filesystem \
params device="192.168.139.8:/web/htdocs/" directory="/var/www/html/" fstype=nfs \
op monitor timeout=60s interval=0
primitive webip IPaddr \
params ip=192.168.139.10 nic=eth0 cidr_netmask=24 \
colocation nfs_with_webip inf: nfs webip
order webip_before_httpd Mandatory: webip httpd
colocation webip_with_httpd inf: httpd webip
location wibip_on_rs1 webip \
rule 100: #uname eq www.rs1.com
expected-quorum-votes=2 \
stonith-enabled=false \
no-quorum-policy=ignore \
resource-stickiness=200
crm(live)configure# show xml
<rsc_order id="webip_before_httpd" kind="Mandatory" first="webip" then="httpd"/>
<rsc_order id="webip_before_nfs" kind="Mandatory" first="webip" then="nfs"/>
<rsc_colocation id="webip_with_httpd" score="INFINITY" rsc="httpd" with-rsc="webip"/>
<rsc_colocation id="nfs_with_webip" score="INFINITY" rsc="nfs" with-rsc="webip"/>
crm(live)# status
2 nodes and 3 resources configured, 2 expected votes \\三个资源两个节点,期望票数为两票
Online: [ www.rs1.com www.rs2.com ]
Full list of resources: \\可以看到所有的资源都在rs2上了,因为资源黏性200,webip在rs1上location只有100,且在未配置Filesystem前,webip和httpd都运行在rs2上,所以现在三个资源都在rs2上
webip(ocf::heartbeat:IPaddr):Started www.rs2.com
httpd(lsb:httpd):Started www.rs2.com
nfs(ocf::heartbeat:Filesystem):Started www.rs2.com
crm(live)# q
bye
[root@www html]# mount \\rs2上可以看到nfs已经挂载
192.168.139.8:/web/htdocs/ on /var/www/html type nfs (rw,vers=4,addr=192.168.139.8,clientaddr=192.168.139.4)
[root@www html]# cd /var/www/html/
[root@www html]# ll
total 4
-rw-r--r--. 1 nobody nobody 21 Nov 12 2016 index.html
[root@www html]# vim index.html \\可以看到NFS-Server共享的页面
<h1>www.NFS.com</h1>
浏览器测试
[root@www html]# crm
crm(live)# node
crm(live)node# standby \\让rs2 standby
crm(live)# status
Online: [ www.rs1.com www.rs2.com ]
Full list of resources: \\资源全部转移到了rs1
webip (ocf::heartbeat:IPaddr): Started www.rs1.com
httpd (lsb:httpd): Started www.rs1.com
nfs (ocf::heartbeat:Filesystem): Started www.rs1.com
浏览器访问,仍然是www.NFS.com 无论访问哪个节点,web页面一样
本文出自 “11097124” 博客,请务必保留此出处http://11107124.blog.51cto.com/11097124/1872079
高可用集群之Corosync+Pacemaker及用CRM命令和NFS-server构建一个HA高可用集群
标签:高可用集群之corosync+pacemaker及用crm命令和nfs-server构建一个ha高可用集群
原文地址:http://11107124.blog.51cto.com/11097124/1872079