CoroSync

时间：2015-06-04 06:25:00 阅读：1077 评论：0 收藏：0 [点我收藏+]

1.1 CoroSync

要说CoroSync就必须先说OpenAIS。AIS：Application Interface Standard, 应用接口规范，定义中间层次。openais源自于SA Forum(服务可用性论坛)。

OpenAIS提供了一种集群模式，包含集群框架、集群成员管理、通信方式、集群监测，但没有集群资源管理功能。组件(接口规范)包括：AMF, CLM, CPKT, EVT等，分支不同，包含的组件略有区别。分支主要有picacho, whitetank, wilson这三个。CoroSync是whitetank升级到wilson时，独立出来的项目。

CoroSync是一个集群管理引擎，只是openais的一个子组件。自从CoroSync独立出来以后，OpenAIS分裂成为corosync, wilson(ais的接口标准) 两个项目。

CoroSync目前维护1和2两个分支，2版本的CoroSync才有独立完整的投票系统。1版本如果想要使用投票功能的话，得使用cman，把一个作为另一个的插件运行。

CoroSync需要Pacemaker资源管理器，才能构成一个完整的高可用集群。Pacemaker的安装依赖于CoroSync和Resource Agents，而Resource Agents又依赖于Cluster Glue。

CoroSync 2.0不再支持Pacemaker作为其插件运行，需要独立出来，1系列版本虽然支持，但在日志中会有警告信息。

CentOS 5：cman + rgmanager

CentOS 6：cman + rgmanager和corosync + pacemaker

命令行管理工具：

crmsh：suse, CentOS 6.4-

pcs：RedHat, CentOS 6.5+

centos 6.4以后的版本如果想要使用crmsh，可以到openSUSE官方进行下载，但是crmsh依赖于pssh，所以要两个包同时下载才行。

1.1.1 安装

1、安装，安装之前和heartbeat一样有四个前提

[root@web1 ~]# yum install corosync pacemaker

2、编辑配置文件

[root@web1 ~]# cd /etc/corosync/              
[root@web1 corosync]# cp corosync.conf.example corosync.conf
[root@web1 corosync]# vim corosync.conf
compatibility: whitetank # 是否兼容老版本
totem { # 定义底层信息层是如何通信的
    secauth: on # 发送的报文需要做签名
    threads: 0 # 线程数，0表示不基于线程而是进程工作
    interface { 
        ringnumber: 0 # 环数目，大多数场景中不需要修改，类似于TTL值
        bindnetaddr: 192.168.1.0 # 将地址绑定在什么网络地址上
        mcastaddr: 239.45.45.1 # 多播地址
        mcastport: 5405 # 多播地址监听的端口
        ttl: 1 # ttl
    }
}
 
logging {
    to_stderr: no # 不发送到标准输出，也就是屏幕
    to_logfile: yes # 记录到文件中
    logfile: /var/log/cluster/corosync.log
    to_syslog: no # 不记录到syslog
    timestamp: on # 是否记录时间，看性能
    logger_subsys { # 是否记录AMF此组件
        subsys: AMF
        debug: off
    }
}
 
service { # 定义Pacemaker以插件方式运行
    ver:    0
    name:   pacemaker
    use_mgmtd: yes # 此行有没有都行
}
 
aisexec { # 有没有都行
    user:   root
    group:  root
}

3、生成报文签名所需的密钥文件，此命令依赖于熵池中的随机数，如果不够的话，会卡住，可以通过I/O的方式快速增加。

[root@web1 ~]# corosync-keygen
Writing corosync key to /etc/corosync/authkey.

4、配置文件、密钥文件同步

[root@web1 ~]# scp -p /etc/corosync/{authkey,corosync.conf} web2:/etc/corosync/

5、启动

[root@web1 ~]# /etc/init.d/corosync start; ssh web2 ‘/etc/init.d/corosync start‘
[root@web1 ~]# ss -tunl
Netid State      Recv-Q Send-Q Local Address:Port 
udp   UNCONN     0      0172.16.45.1:5404   
udp   UNCONN     0      0172.16.45.1:5405   
udp   UNCONN     0      0  239.45.45.1:5405 # 多播地址被监听

(1) 查看corosync引擎是否正常启动

[root@web1 ~]# grep -e "Corosync Cluster Engine" -e "configuration file" /var/log/cluster/corosync.log 
May 31 16:46:12 corosync [MAIN  ] Corosync Cluster Engine (‘1.4.7‘): started and ready to provide service.
May 31 16:46:12 corosync [MAIN  ] Successfully read main configuration file ‘/etc/corosync/corosync.conf‘.

(2) 查看初始化成员节点通知是否正常发出

[root@web1 ~]# grep  TOTEM  /var/log/cluster/corosync.log
May 31 16:46:12 corosync [TOTEM ] Initializing transport (UDP/IP Multicast).
May 31 16:46:12 corosync [TOTEM ] Initializing transmit/receive security: libtomcrypt SOBER128/SHA1HMAC (mode 0).
May 31 16:46:12 corosync [TOTEM ] The network interface [172.16.45.1] is now up.
May 31 16:46:12 corosync [TOTEM ] A processor joined or left the membership and a new membership was formed.
May 31 16:46:14 corosync [TOTEM ] A processor joined or left the membership and a new membership was formed.

(3) 检查启动过程中是否有错误产生。下面的错误信息表示packmaker不久之后将不再作为corosync的插件运行，因此，建议使用cman作为集群基础架构服务；此处可安全忽略。

[root@web1 ~]# grep ERROR: /var/log/cluster/corosync.log | grep -v unpack_resources
May 31 16:46:12 corosync [pcmk  ] ERROR: process_ais_conf: You have configured a cluster using the Pacemaker plugin for Corosync. The plugin is not supported in this environment and will be removed very soon.
May 31 16:46:12 corosync [pcmk  ] ERROR: process_ais_conf:  Please see Chapter 8 of ‘Clusters from Scratch‘ (http://www.clusterlabs.org/doc) for details on using Pacemaker with CMAN
May 31 16:46:13 corosync [pcmk  ] ERROR: pcmk_wait_dispatch: Child process mgmtd exited (pid=2673, rc=100) # 表示mgmtd即使写了，也无法生效，缺少组件，插件式的兼容问题。

(4) 查看pacemaker是否正常启动

[root@web1 ~]# grep pcmk_startup /var/log/cluster/corosync.log 
May 31 16:46:12 corosync [pcmk  ] info: pcmk_startup: CRM: Initialized
May 31 16:46:12 corosync [pcmk  ] Logging: Initialized pcmk_startup
May 31 16:46:12 corosync [pcmk  ] info: pcmk_startup: Maximum core file size is: 18446744073709551615
May 31 16:46:12 corosync [pcmk  ] info: pcmk_startup: Service: 9
May 31 16:46:12 corosync [pcmk  ] info: pcmk_startup: Local hostname: web1

6、为节点安装配置接口crmsh，只需要在一个节点上安装就行，直接送给DC，然后同步给其他节点

[root@web2 ~]# yum --nogpgcheck localinstall crmsh-2.1-1.6.x86_64.rpm pssh-2.3.1-2.el6.x86_64.rpm

1.1.2 crm命令

crm是模式化命令，有多个模式。所有配置完成以后，只是在内存中，需要commit提交保存生效。

1.1.2.1 crm常用子命令

status：显示当前的集群状态

[root@web1 ~]# crm status
Last updated: Sun May 31 17:56:51 2015
Last change: Sun May 31 16:46:34 2015
Stack: classic openais (with plugin) # 1版CoroSync的原因
Current DC: web1 - partition with quorum
Version: 1.1.11-97629de
2 Nodes configured, 2 expected votes
0 Resources configured
 
 
Online: [ web1 web2 ]

node：

configure：表示配置集群，也有众多子命令

ra：

resource：

1.1.2.2 configure常用的子命令

primitive：

group：

clone：

ms：

location：

colocation：

order：

verify：检查配置文件的语法

crm(live)configure# verify # 首先需要禁止STONITH设备，不然集群无法启动
   error: unpack_resources: Resource start-up disabled since no STONITH resources have been defined
   error: unpack_resources: Either configure some or disable STONITH with the stonith-enabled option
   error: unpack_resources: NOTE: Clusters with shared data need STONITH to ensure data integrity
Errors found during check: config not valid

show [xml]：最常用，显示当前集群的配置信息

crm(live)configure# show
node web1
node web2
property cib-bootstrap-options: \ # 集群的全局属性，配置集群的基本工作特性
dc-version=1.1.11-97629de cluster-infrastructure="classic openais (with plugin)" expected-quorum-votes=2

property：集群的全局属性，配置集群的基本工作特性，以下是所有可以配置的全局属性

crm(live)configure# property 
batch-limit=                   enable-startup-probes=         node-health-strategy=          startup-fencing= 
cluster-delay=                 is-managed-default=            node-health-yellow=            stonith-action= 
cluster-recheck-interval=      load-threshold=                pe-error-series-max=           stonith-enabled= 
crmd-transition-delay=         maintenance-mode=              pe-input-series-max=           stonith-timeout= 
dc-deadtime=                   migration-limit=               pe-warn-series-max=            stop-all-resources= 
default-action-timeout=        no-quorum-policy=              placement-strategy=            stop-orphan-actions= 
default-resource-stickiness=   node-action-limit=             remove-after-stop=             stop-orphan-resources= 
election-timeout=              node-health-green=             shutdown-escalation=           symmetric-cluster= 
enable-acl=                    node-health-red=               start-failure-is-fatal=

# stonith-enabled=：是不是配置stonith设备，默认为是

crm(live)configure# property stonith-enabled=
stonith-enabled (boolean, [true]): # 接受布尔值，默认为true
    Failed nodes are STONITH‘d
primitive <rsc_id> class:provider:ra params param1=value1 param2=value2 op op1 param1=value op op2 parma1=value1

# no-quorum-policy=：设定集群不再拥有法定票数时的策略

crm(live)configure# property no-quorum-policy= # 这是一个枚举值，默认为stop
no-quorum-policy (enum, [stop]): What to do when the cluster does not have quorum
    What to do when the cluster does not have quorum  Allowed values: stop, freeze, ignore, suicide # 四个可选值

# default-resource-stickiness=：设定资源的对当前节点黏性

crm(live)configure# property default-resource-stickiness= # 默认为0
default-resource-stickiness (integer, [0]):

edit：直接编辑配置文件，会以vim编辑器的形式打开

crm(live)configure# edit
node web1     attributes standby=off
node web2     attributes standby=off
primitive webip IPaddr     params ip=172.16.45.11 nic=eth0 cidr_netmask=16
primitive webserver lsb:httpd
group webservice webip webserver
property cib-bootstrap-options:     dc-version=1.1.11-97629de     cluster-infrastructure="classic openais (with plugin)"     expected-quorum-votes=2     stonith-enabled=false
#vim:set syntax=pcmk

delete：删除一个资源，但是无法删除正在运行的。

1.1.2.3 node常用子命令

clearstate：清除节点的状态信息，对哪个节点进行清理，会造成该节点处于离线状态。假入节点1离线，可以重启节点1的CoroSync服务。然后清除节点2的状态，并重启CoroSync服务就OK。

delete：删除一个节点

ls：显示当前所有可用的节点，但需要指明直接和子命令

online：设置当前节点处于在线状态

crm(live)node# online

show：显示所有节点

crm(live)node# show
web1: normal
web2: normal

standby：使当前节点处于备用状态

crm(live)node# standby

status：节点的信息

crm(live)node# status
<nodes>
  <node id="web1" uname="web1"/>
  <node id="web2" uname="web2">
    <instance_attributes id="nodes-web2">
      <nvpair id="nodes-web2-standby" name="standby" value="off"/>
    </instance_attributes>
  </node>
</nodes>

1.1.2.4 resource子命令

cleanup：清理资源状态，在出现警告信息后，使用此命令可以清理警告信息。

failcount：管理错误次数统计

migrate：手动迁移一个资源到其他节点

unmigrate：从其他节点迁移一个资源到当前主机

restart：重启一个资源

start：启动一个资源

stop：停止

manage：使一个资源处于可被管理状态

unmanage：使一个资源处于不被集群调度的状态

1.1.2.5 ra子命令

查看资源代理的类别，资源代理需要哪些必给参数，以及可选参数

classes：支持资源代理的类别，一共四种

crm(live)ra# classes
lsb
ocf / heartbeat pacemaker # osf有后两个子类别
service # 其实就是lsb
stonith

list：查看资源类别下，有哪些资源代理可用

crm(live)ra# list ocf pacemaker
ClusterMon    Dummy         HealthCPU     HealthSMART   Stateful      SysInfo       SystemHealth  controld      ping          pingd
remote
 
crm(live)ra# list stonith # 每安装一个stonith设备，就会引入新的资源代理
fence_legacy  fence_pcmk

info：查看资源代理的使用格式帮助

crm(live)ra# info ocf:heartbeat:IPaddr

1.1.2.6 案例4 web服务

webip: 172.16.45.11

1、如果没有stonith设备，首先就将其禁用，不然报错，集群无法启动。

crm(live)configure# property stonith-enabled=false
crm(live)configure# verify # 再次检查就不会报错了

2、提交生效

crm(live)configure# commit

3、定义资源

(1) ip资源

crm(live)configure# primitive webip ocf:heartbeat:IPaddr params ip=172.16.45.11 nic=eth0 cidr_netmask=16
crm(live)configure# verify
crm(live)configure# commit
# 此资源就定义好了，但是只能使用ip命令查看，虽然是使用IPaddr。
[root@web1 ~]# crm status # 可以看到此资源已然启动
 webip(ocf::heartbeat:IPaddr):Started web1
[root@web1 ~]# crm node standby # 资源就转移了

(2) httpd资源

crm(live)configure# primitive webserver lsb:httpd
crm(live)configure# verify
crm(live)configure# commit

(3) 组资源，让两个资源在一起

crm(live)configure# group webservice webip webserver # 写在前面的资源先启动
crm(live)configure# verify
crm(live)configure# commit

(4) 删除组资源，无法删除正在运行的资源

crm(live)# resource stop webservice # 先停止组资源
crm(live)# configure delete webservice
# 还可以使用edit之间编辑配置文件

4、定义约束

(1) 排列约束

crm(live)configure# colocation webserver_with_webip inf: webserver webip # webip在后面，表示已webip为首。inf:：无穷大
crm(live)configure# verify
crm(live)configure# commit

(2) 顺序约束

crm(live)configure# order webip_before_webserver Mandatory: webip webserver # Mandatory:强制性的，写在前面的先启动
crm(live)configure# verify
crm(live)configure# commit

(3) 位置约束

crm(live)configure# location webip_on_web2 webip 50: web2 # 这是简单的定义方法
crm(live)configure# delete webip_on_web2 # 先删除
crm(live)configure# location webip_on_web2 webip rule #uname eq web2 # 规则定义，rule表示规则
crm(live)configure# location webip_on_web2 webip rule 50: #uname eq web2 # 指明数值
crm(live)configure# verify
crm(live)configure# commit

(4) 定义资源对当前黏性，在数值相差不大的情况下，资源不会转移。

crm(live)configure# property default-resource-stickiness=50 # 黏性为50，但是两个资源加起来为100，大于倾向性50.
 crm(live)configure# verify 
crm(live)configure# commit

但是此集群还是有问题，如果此时将web1上的httpd killall掉，集群是不会重启该资源的，甚至它还以为此进程还是运行的好好的。因为高可用集群默认只对节点高可用而不是资源。因此，需要定义监控。

5、定义监控

(1) 先停止资源，然后将所有定义的条目删除

crm(live)# resource stop webip
crm(live)# resource stop webserver
crm(live)# configure
crm(live)configure# edit
 
node web1     attributes standby=off
node web2     attributes standby=off
property cib-bootstrap-options:     dc-version=1.1.11-97629de     cluster-infrastructure="classic openais (with plugin)"     expected-quorum-votes=2     stonith-enabled=false     default-resource-stickiness=50
#vim:set syntax=pcmk 
 
crm(live)configure# verify
crm(live)configure# commit

(2) 定义IP资源

crm(live)configure# primitive webip ocf:heartbeat:IPaddr params ip=172.16.45.12 nic=eth0 cidr_netmask=16 op monitor interval=10s timeout=20s
# op：操作
# monitor：监控
# interval：参数1，多久监控一次
# timeout：参数2，超时时长
# 如果定义的参数小于crm运行的值，会报错
crm(live)configure# verify
crm(live)configure# commit

(3) httpd资源

crm(live)configure# primitive webserver lsb:httpd op monitor interval=10s timeout=20s
crm(live)configure# verify
crm(live)configure# commit

(4) 组资源

crm(live)configure# group webservice webip webserver
crm(live)configure# verify
crm(live)configure# commit

(5) 文件系统

crm(live)configure# primitive webstore ocf:heartbeat:Filesystem params device="172.16.45.3:/web/htdocs" directory="/var/www/html" fstype="nfs" op monitor interval=20s timeout=40s op start timeout=60s op stop timeout=60s
crm(live)configure# verify
crm(live)configure# commit
# 可以使用edit将webstore加入到组中

配置两点的corosync/pacemaker集群，设置两个全局属性：

stonith-enabled=false：不配置stonith设备

no-quorum-policy=ignore：设定为不再拥有法定票数时，资源还是会启动。

CoroSync

标签：ha linux pacemaker

原文地址：http://10042224.blog.51cto.com/10032224/1658042

踩

(0)

评论一句话评论（0）

分享档案

更多>

2021年07月29日 (22)
2021年07月28日 (40)
2021年07月27日 (32)
2021年07月26日 (79)
2021年07月23日 (29)
2021年07月22日 (30)
2021年07月21日 (42)
2021年07月20日 (16)
2021年07月19日 (90)
2021年07月16日 (35)

周排行