首先,下载安装Redis。目前随Redis 2.8发布的Sentinel版本被antirez称为Sentinel 2,是在Sentinel 1的基础上重写的。因为Sentinel 1已经废弃而且BUG太多,所以antirez强烈建议将Redis和Sentinel均升级到2.8版本,本博主安装的版本为最新的2.8.17。
其次,配置并启动Redis实例。分别在6379、6380和6381三个本地端口上启动三个Redis实例,其中6379为Master,其余两个为Slave。关于Redis的主从配置这里就不再赘述了,但其中需要指出的是两个Slave在配置参数slave-priority上的区别:6380实例该配置参数为50,6381实例该配置参数为100,这样当Master挂掉的时候Sentinel会优先选择slave-priority值较小的作为新的Master。
最后,配置并启动Sentinel实例。分别在26379、26380和26381三个本地端口上启动三个Sentinel实例,这三个Sentinel实例用来监控上面已经启动的三个Redis实例。以下是26379上Sentinel实例的配置文件内容,参考官方文档仅配置几个主要的参数,其他两个实例的配置文件只是端口号和数据目录不同。
[8229] 18 Nov 11:18:46.677 # You requested maxclients of 10000 requiring at least 10032 max file descriptors.
[8229] 18 Nov 11:18:46.677 # Redis can‘t set maximum open files to 10032 because of OS error: Operation not permitted.
[8229] 18 Nov 11:18:46.677 # Current maximum open files is 1024. maxclients has been reduced to 992 to compensate for low ulimit. If you need higher maxclients increase ‘ulimit -n‘.
[8229] 18 Nov 11:18:46.679 # Sentinel runid is 2262ed911e9414208af4b1c48ad2b449fd4e0b89
[8229] 18 Nov 11:18:46.679 # +monitor master mymaster 127.0.0.1 6379 quorum 2
[8229] 18 Nov 11:18:46.679 * +slave slave 127.0.0.1:6380 127.0.0.1 6380 @ mymaster 127.0.0.1 6379
[8229] 18 Nov 11:18:46.679 * +slave slave 127.0.0.1:6381 127.0.0.1 6381 @ mymaster 127.0.0.1 6379
[8229] 18 Nov 11:19:27.260 * +sentinel sentinel 127.0.0.1:26380 127.0.0.1 26380 @ mymaster 127.0.0.1 6379
[8229] 18 Nov 11:19:36.069 * +sentinel sentinel 127.0.0.1:26381 127.0.0.1 26381 @ mymaster 127.0.0.1 6379
目前为止,Sentinel集群都是正常工作的,接下来我们再来看一看Sentinel集群是如何处理Master Redis实例挂掉的。我们通过kill掉运行在6379端口上的Redis实例进程来触发这一情况,同时观察Sentinel集群各个实例的日志信息,以下为各个实例处理Master Redis实例挂掉的日志信息。
[8229] 19 Nov 14:41:32.033 # +sdown master mymaster 127.0.0.1 6379
[8229] 19 Nov 14:41:32.116 # +odown master mymaster 127.0.0.1 6379 #quorum 2/2
[8229] 19 Nov 14:41:32.116 # +new-epoch 1
[8229] 19 Nov 14:41:32.116 # +try-failover master mymaster 127.0.0.1 6379
[8229] 19 Nov 14:41:32.286 # +vote-for-leader 2262ed911e9414208af4b1c48ad2b449fd4e0b89 1
[8229] 19 Nov 14:41:32.286 # 127.0.0.1:26381 voted for 22b65a4796e6ece6b76284558a071cc83df71098 1
[8229] 19 Nov 14:41:32.387 # 127.0.0.1:26380 voted for 22b65a4796e6ece6b76284558a071cc83df71098 1
[8229] 19 Nov 14:41:33.326 # +config-update-from sentinel 127.0.0.1:26381 127.0.0.1 26381 @ mymaster 127.0.0.1 6379
[8229] 19 Nov 14:41:33.326 # +switch-master mymaster 127.0.0.1 6379 127.0.0.1 6380
[8229] 19 Nov 14:41:33.326 * +slave slave 127.0.0.1:6381 127.0.0.1 6381 @ mymaster 127.0.0.1 6380
[8229] 19 Nov 14:41:33.430 * +slave slave 127.0.0.1:6379 127.0.0.1 6379 @ mymaster 127.0.0.1 6380
[8229] 19 Nov 14:42:03.507 # +sdown slave 127.0.0.1:6379 127.0.0.1 6379 @ mymaster 127.0.0.1 6380
26380实例:
[8243] 19 Nov 14:41:32.023 # +sdown master mymaster 127.0.0.1 6379
[8243] 19 Nov 14:41:32.336 # +new-epoch 1
[8243] 19 Nov 14:41:32.386 # +vote-for-leader 22b65a4796e6ece6b76284558a071cc83df71098 1
[8243] 19 Nov 14:41:33.151 # +odown master mymaster 127.0.0.1 6379 #quorum 3/2
[8243] 19 Nov 14:41:33.151 # Next failover delay: I will not start a failover before Wed Nov 19 14:47:32 2014
[8243] 19 Nov 14:41:33.327 # +config-update-from sentinel 127.0.0.1:26381 127.0.0.1 26381 @ mymaster 127.0.0.1 6379
[8243] 19 Nov 14:41:33.328 # +switch-master mymaster 127.0.0.1 6379 127.0.0.1 6380
[8243] 19 Nov 14:41:33.328 * +slave slave 127.0.0.1:6381 127.0.0.1 6381 @ mymaster 127.0.0.1 6380
[8243] 19 Nov 14:41:33.558 * +slave slave 127.0.0.1:6379 127.0.0.1 6379 @ mymaster 127.0.0.1 6380
[8243] 19 Nov 14:42:03.616 # +sdown slave 127.0.0.1:6379 127.0.0.1 6379 @ mymaster 127.0.0.1 6380
26381实例:
[8247] 19 Nov 14:41:32.042 # +sdown master mymaster 127.0.0.1 6379
[8247] 19 Nov 14:41:32.094 # +odown master mymaster 127.0.0.1 6379 #quorum 3/2
[8247] 19 Nov 14:41:32.094 # +new-epoch 1
[8247] 19 Nov 14:41:32.094 # +try-failover master mymaster 127.0.0.1 6379
[8247] 19 Nov 14:41:32.194 # +vote-for-leader 22b65a4796e6ece6b76284558a071cc83df71098 1
[8247] 19 Nov 14:41:32.286 # 127.0.0.1:26379 voted for 2262ed911e9414208af4b1c48ad2b449fd4e0b89 1
[8247] 19 Nov 14:41:32.387 # 127.0.0.1:26380 voted for 22b65a4796e6ece6b76284558a071cc83df71098 1
[8247] 19 Nov 14:41:32.396 # +elected-leader master mymaster 127.0.0.1 6379
[8247] 19 Nov 14:41:32.396 # +failover-state-select-slave master mymaster 127.0.0.1 6379
[8247] 19 Nov 14:41:32.459 # +selected-slave slave 127.0.0.1:6380 127.0.0.1 6380 @ mymaster 127.0.0.1 6379
[8247] 19 Nov 14:41:32.459 * +failover-state-send-slaveof-noone slave 127.0.0.1:6380 127.0.0.1 6380 @ mymaster 127.0.0.1 6379
[8247] 19 Nov 14:41:32.522 * +failover-state-wait-promotion slave 127.0.0.1:6380 127.0.0.1 6380 @ mymaster 127.0.0.1 6379
[8247] 19 Nov 14:41:33.307 # +promoted-slave slave 127.0.0.1:6380 127.0.0.1 6380 @ mymaster 127.0.0.1 6379
[8247] 19 Nov 14:41:33.307 # +failover-state-reconf-slaves master mymaster 127.0.0.1 6379
[8247] 19 Nov 14:41:33.326 * +slave-reconf-sent slave 127.0.0.1:6381 127.0.0.1 6381 @ mymaster 127.0.0.1 6379
[8247] 19 Nov 14:41:33.851 # -odown master mymaster 127.0.0.1 6379
[8247] 19 Nov 14:41:34.356 * +slave-reconf-inprog slave 127.0.0.1:6381 127.0.0.1 6381 @ mymaster 127.0.0.1 6379
[8247] 19 Nov 14:41:34.356 * +slave-reconf-done slave 127.0.0.1:6381 127.0.0.1 6381 @ mymaster 127.0.0.1 6379
[8247] 19 Nov 14:41:34.426 # +failover-end master mymaster 127.0.0.1 6379
[8247] 19 Nov 14:41:34.426 # +switch-master mymaster 127.0.0.1 6379 127.0.0.1 6380
[8247] 19 Nov 14:41:34.427 * +slave slave 127.0.0.1:6381 127.0.0.1 6381 @ mymaster 127.0.0.1 6380
[8247] 19 Nov 14:41:34.479 * +slave slave 127.0.0.1:6379 127.0.0.1 6379 @ mymaster 127.0.0.1 6380
[8247] 19 Nov 14:42:04.531 # +sdown slave 127.0.0.1:6379 127.0.0.1 6379 @ mymaster 127.0.0.1 6380
由以上日志内容我们大致可以看到Sentinel集群处理Master Redis实例挂掉的基本流程:1)每个Sentinel实例通过监控发现6379端口的Master Redis实例不工作,于是将该实例的状态设置为sdown;2)通过Sentinel彼此之间通信确认大多数Sentinel实例均认为Master Redis挂掉,于是将该实例的状态设置为odown;3)准备触发Master Redis实例的failover,要选举一个Sentinel实例进行首次failover操作;4)选举出来的Sentinel实例从Slave Redis实例中选择一个出来成为新的Master Redis实例;5)完成Master Redis实例的切换之后,在各个Sentinel实例间同步最新的配置信息;6)让落选的Slave Redis实例切换到新的Master Redis实例,开始同步数据。
具体到我们的环境就是运行在端口26381上的Sentinel实例获得了执行此次failover的权限,于是它选择运行在端口6380上的Slave Redis实例成为新的Master Redis实例(因为6380实例的slave-priority比6381实例的值小),切换完成后落选的6381实例开始转而备份6380实例的数据。此时我们再看一看Sentinel实例的配置文件,以确认配置信息确实进行了更新。以下同样为26379实例的配置文件的主要内容,对比之前的配置文件内容我们可以知道Master Redis实例确实发生了切换,当前的配置信息版本已经变为1。
sentinel monitor mymaster 127.0.0.1 6380 2
sentinel known-slave mymaster 127.0.0.1 6381
sentinel known-slave mymaster 127.0.0.1 6379
sentinel known-sentinel mymaster 127.0.0.1 26381 22b65a4796e6ece6b76284558a071cc83df71098
sentinel known-sentinel mymaster 127.0.0.1 26380 59616326f3c539ff3301098e1bf708350e6dd45d
sentinel current-epoch 1
我们再执行一次上面的Jedis测试程序,得到以下结果,从Sentinel集群获取到的确实已经是新的Master Redis实例了!
2014-11-20 16:39:00 redis.clients.jedis.JedisSentinelPool initSentinels
信息: Trying to find master from available Sentinels...
2014-11-20 16:39:00 redis.clients.jedis.JedisSentinelPool initSentinels
信息: Redis master running at 127.0.0.1:6380, starting Sentinel listeners...
2014-11-20 16:39:00 redis.clients.jedis.JedisSentinelPool initPool
信息: Created JedisPool to master at 127.0.0.1:6380
Current master: 127.0.0.1:6380
username: liangzhichao