标签:lob 一个 rtu test backup failure rom 告警 shu
Pg10搭建了流复制主备切换配置
主库10.10.10.13 pghost4
备库10.10.10.14 pghost5
测试主库出现问题,停止后,备库自动切换成为新主库。老主库现在应该是备库了,启动它验证状态。发现备库启动不起来。通过以下方法,备库启动了。
[root@pghost4 ~]# su - pg10
pg10@pghost4->pg_ctl start
waiting for server to start....2019-12-20 21:44:10.004 CST [23416] LOG: listening on IPv4 address "0.0.0.0", port 1921
2019-12-20 21:44:10.004 CST [23416] LOG: listening on IPv6 address "::", port 1921
2019-12-20 21:44:10.006 CST [23416] LOG: listening on Unix socket "/tmp/.s.PGSQL.1921"
2019-12-20 21:44:10.014 CST [23417] LOG: database system was shut down in recovery at 2019-12-20 21:33:31 CST
2019-12-20 21:44:10.014 CST [23417] LOG: entering standby mode
2019-12-20 21:44:10.014 CST [23417] FATAL: requested timeline 2 is not a child of this server‘s history
2019-12-20 21:44:10.014 CST [23417] DETAIL: Latest checkpoint is at 0/11000028 on timeline 1, but in the history of the requested timeline, the server forked off from that timeline at 0/C000098.
2019-12-20 21:44:10.015 CST [23416] LOG: startup process (PID 23417) exited with exit code 1
2019-12-20 21:44:10.015 CST [23416] LOG: aborting startup due to startup process failure
2019-12-20 21:44:10.016 CST [23416] LOG: database system is shut down
stopped waiting
pg_ctl: could not start server
Examine the log output.
从上边的意思可以看到,新备库pg_wal目录下的00000002.history 这个跟新主库不一致。
pg10@pghost4->ll
total 273M
-rw------- 1 pg10 pg10 16M Dec 18 22:40 000000010000000000000001
-rw------- 1 pg10 pg10 16M Dec 18 23:01 000000010000000000000002
-rw------- 1 pg10 pg10 16M Dec 18 23:02 000000010000000000000003
-rw------- 1 pg10 pg10 16M Dec 18 23:06 000000010000000000000004
-rw------- 1 pg10 pg10 16M Dec 18 23:39 000000010000000000000005
-rw------- 1 pg10 pg10 16M Dec 18 23:51 000000010000000000000006
-rw------- 1 pg10 pg10 16M Dec 18 23:55 000000010000000000000007
-rw------- 1 pg10 pg10 16M Dec 18 23:58 000000010000000000000008
-rw------- 1 pg10 pg10 16M Dec 19 00:15 000000010000000000000009
-rw------- 1 pg10 pg10 16M Dec 19 00:15 00000001000000000000000A
-rw------- 1 pg10 pg10 16M Dec 19 00:27 00000001000000000000000B
-rw------- 1 pg10 pg10 16M Dec 19 00:29 00000001000000000000000C
-rw------- 1 pg10 pg10 16M Dec 19 00:34 00000001000000000000000D
-rw------- 1 pg10 pg10 16M Dec 19 00:36 00000001000000000000000E
-rw------- 1 pg10 pg10 16M Dec 19 00:36 00000001000000000000000F
-rw------- 1 pg10 pg10 302 Dec 19 00:36 00000001000000000000000F.00000028.backup
-rw------- 1 pg10 pg10 16M Dec 19 00:38 000000010000000000000010
-rw------- 1 pg10 pg10 16M Dec 20 21:33 000000010000000000000011
-rw------- 1 pg10 pg10 41 Dec 19 00:30 00000002.history
drwx------ 2 pg10 pg10 4.0K Dec 20 21:31 archive_status
pg10@pghost4->more 00000002.history
1 0/C000098 no recovery target specified
pg10@pghost4->
然后看新主库该路径下的目录
pg10@pghost4->ll
total 132K
-rw------- 1 pg10 pg10 206 Dec 19 00:36 backup_label.old
drwx------ 5 pg10 pg10 4.0K Dec 19 00:36 base
drwx------ 2 pg10 pg10 4.0K Dec 20 21:27 global
drwx------ 2 pg10 pg10 4.0K Dec 19 00:36 pg_commit_ts
drwx------ 2 pg10 pg10 4.0K Dec 19 00:36 pg_dynshmem
-rw------- 1 pg10 pg10 4.6K Dec 19 00:36 pg_hba.conf
-rw------- 1 pg10 pg10 1.6K Dec 19 00:36 pg_ident.conf
drwx------ 4 pg10 pg10 4.0K Dec 19 00:38 pg_logical
drwx------ 4 pg10 pg10 4.0K Dec 19 00:36 pg_multixact
drwx------ 2 pg10 pg10 4.0K Dec 19 00:37 pg_notify
drwx------ 2 pg10 pg10 4.0K Dec 19 00:36 pg_replslot
drwx------ 2 pg10 pg10 4.0K Dec 19 00:36 pg_serial
drwx------ 2 pg10 pg10 4.0K Dec 19 00:36 pg_snapshots
drwx------ 2 pg10 pg10 4.0K Dec 19 00:36 pg_stat
drwx------ 2 pg10 pg10 4.0K Dec 20 21:48 pg_stat_tmp
drwx------ 2 pg10 pg10 4.0K Dec 19 00:38 pg_subtrans
drwx------ 2 pg10 pg10 4.0K Dec 19 00:36 pg_tblspc
drwx------ 2 pg10 pg10 4.0K Dec 19 00:36 pg_twophase
-rw------- 1 pg10 pg10 3 Dec 19 00:36 PG_VERSION
drwx------ 3 pg10 pg10 4.0K Dec 19 00:38 pg_wal
drwx------ 2 pg10 pg10 4.0K Dec 19 00:36 pg_xact
-rw------- 1 pg10 pg10 88 Dec 19 00:36 postgresql.auto.conf
-rw------- 1 pg10 pg10 23K Dec 19 00:36 postgresql.conf
-rw------- 1 pg10 pg10 31 Dec 19 00:37 postmaster.opts
-rw------- 1 pg10 pg10 74 Dec 19 00:38 postmaster.pid
-rw-r--r-- 1 pg10 pg10 5.8K Dec 19 00:37 recovery.done
pg10@pghost4->cd pg_wal/br/>pg10@pghost4->ll
total 65M
-rw------- 1 pg10 pg10 16M Dec 19 00:36 00000001000000000000000F
-rw------- 1 pg10 pg10 16M Dec 19 00:38 000000010000000000000010
-rw------- 1 pg10 pg10 16M Dec 19 00:38 000000010000000000000011.partial
-rw------- 1 pg10 pg10 16M Dec 19 00:38 000000020000000000000011
-rw------- 1 pg10 pg10 42 Dec 19 00:38 00000002.history
drwx------ 2 pg10 pg10 4.0K Dec 19 00:38 archive_statusbr/>pg10@pghost4->pwd
/data/pg10/data/pg_wal
pg10@pghost4->more 00000002.history
1 0/11000098 no recovery target specified
将新备库的00000002.history 备份,然后删除。
将新主库的传递到新备库。
然后启动。
pg10@pghost4->scp 00000002.history 10.10.10.13:/data/pg10/data/pg_wal/
pg10@10.10.10.13‘s password:
00000002.history 100% 42 0.0KB/s 00:00
pg10@pghost4->
pg10@pghost4->pg_ctl start
waiting for server to start....2019-12-20 21:51:25.142 CST [23486] LOG: listening on IPv4 address "0.0.0.0", port 1921
2019-12-20 21:51:25.142 CST [23486] LOG: listening on IPv6 address "::", port 1921
2019-12-20 21:51:25.144 CST [23486] LOG: listening on Unix socket "/tmp/.s.PGSQL.1921"
2019-12-20 21:51:25.155 CST [23487] LOG: database system was shut down in recovery at 2019-12-20 21:33:31 CST
2019-12-20 21:51:25.155 CST [23487] LOG: entering standby mode
2019-12-20 21:51:25.171 CST [23487] LOG: consistent recovery state reached at 0/11000098
2019-12-20 21:51:25.171 CST [23487] LOG: invalid record length at 0/11000098: wanted 24, got 0
2019-12-20 21:51:25.171 CST [23486] LOG: database system is ready to accept read only connections
2019-12-20 21:51:25.179 CST [23491] LOG: started streaming WAL from primary at 0/11000000 on timeline 2
2019-12-20 21:51:25.225 CST [23487] LOG: redo starts at 0/11000098
done
server started
pg10@pghost4->
发现已经启动。
pg10@pghost4->pg_controldata|grep cluster
Database cluster state: in archive recovery
pg10@pghost4->
可以看到新备库处于备库模式。
标签:lob 一个 rtu test backup failure rom 告警 shu
原文地址:https://blog.51cto.com/lishiyan/2460518