之前有两位朋友碰到过在对greenplum的系统构架更改后,出现全量恢复gprecoverseg -F也无法正常运行的情况。
报错信息为Unable to connect to database. Retrying 1
gprecoverseg failed. (Reason=‘Unable to connect to database and start transaction‘) exiting...
有幸拷得一份虚拟机上的全部文件,对其进行分析。
发现其实出现这个问题只需要修改pg_changetracking下的CT_METADATA,或者说从其他正常的主事例上拷贝一份到出问题的主事例上即可。
以下为大致分析的过程,感兴趣的可以看一下。
--启动数据库,有一个mirror出错。
[gpadmin@gpmaster ~]$ gpstart -a
20150727:22:28:21:001922 gpstart:gpmaster:gpadmin-[INFO]:-Starting gpstart with args: -a
20150727:22:28:21:001922 gpstart:gpmaster:gpadmin-[INFO]:-Gathering information and validating the environment...
20150727:22:28:28:001922 gpstart:gpmaster:gpadmin-[INFO]:-Greenplum Binary Version: ‘postgres (Greenplum Database) 4.3.4.1 build 2‘
20150727:22:28:28:001922 gpstart:gpmaster:gpadmin-[INFO]:-Greenplum Catalog Version: ‘201310150‘
20150727:22:28:29:001922 gpstart:gpmaster:gpadmin-[INFO]:-Starting Master instance in admin mode
20150727:22:28:33:001922 gpstart:gpmaster:gpadmin-[INFO]:-Obtaining Greenplum Master catalog information
20150727:22:28:33:001922 gpstart:gpmaster:gpadmin-[INFO]:-Obtaining Segment details from master...
20150727:22:28:34:001922 gpstart:gpmaster:gpadmin-[INFO]:-Setting new master era
20150727:22:28:34:001922 gpstart:gpmaster:gpadmin-[INFO]:-Master Started...
20150727:22:28:34:001922 gpstart:gpmaster:gpadmin-[INFO]:-Shutting down master
20150727:22:28:36:001922 gpstart:gpmaster:gpadmin-[WARNING]:-Skipping startup of segment marked down in configuration: on gpslave-2 directory /data/mirror/gpseg0 <<<<<
20150727:22:28:36:001922 gpstart:gpmaster:gpadmin-[INFO]:-Commencing parallel primary and mirror segment instance startup, please wait...
...............................................
20150727:22:29:23:001922 gpstart:gpmaster:gpadmin-[INFO]:-Process results...
20150727:22:29:23:001922 gpstart:gpmaster:gpadmin-[INFO]:-----------------------------------------------------
20150727:22:29:23:001922 gpstart:gpmaster:gpadmin-[INFO]:- Successful segment starts = 3
20150727:22:29:23:001922 gpstart:gpmaster:gpadmin-[INFO]:- Failed segment starts = 0
20150727:22:29:23:001922 gpstart:gpmaster:gpadmin-[WARNING]:-Skipped segment starts (segments are marked down in configuration) = 1 <<<<<<<<
20150727:22:29:23:001922 gpstart:gpmaster:gpadmin-[INFO]:-----------------------------------------------------
20150727:22:29:23:001922 gpstart:gpmaster:gpadmin-[INFO]:-
20150727:22:29:23:001922 gpstart:gpmaster:gpadmin-[INFO]:-Successfully started 3 of 3 segment instances, skipped 1 other segments
20150727:22:29:23:001922 gpstart:gpmaster:gpadmin-[INFO]:-----------------------------------------------------
20150727:22:29:23:001922 gpstart:gpmaster:gpadmin-[WARNING]:-****************************************************************************
20150727:22:29:23:001922 gpstart:gpmaster:gpadmin-[WARNING]:-There are 1 segment(s) marked down in the database
20150727:22:29:23:001922 gpstart:gpmaster:gpadmin-[WARNING]:-To recover from this current state, review usage of the gprecoverseg
20150727:22:29:23:001922 gpstart:gpmaster:gpadmin-[WARNING]:-management utility which will recover failed segment instance databases.
20150727:22:29:23:001922 gpstart:gpmaster:gpadmin-[WARNING]:-****************************************************************************
20150727:22:29:23:001922 gpstart:gpmaster:gpadmin-[INFO]:-Starting Master instance gpmaster directory /data/master/gpseg-1
20150727:22:29:24:001922 gpstart:gpmaster:gpadmin-[INFO]:-Command pg_ctl reports Master gpmaster instance active
20150727:22:29:26:001922 gpstart:gpmaster:gpadmin-[INFO]:-No standby master configured. skipping...
20150727:22:29:26:001922 gpstart:gpmaster:gpadmin-[WARNING]:-Number of segments not attempted to start: 1
20150727:22:29:26:001922 gpstart:gpmaster:gpadmin-[INFO]:-Check status of database with gpstate utility
--执行全量恢复,5次尝试超时,提示Unable to connect to database and start transaction。
[gpadmin@gpmaster ~]$ gprecoverseg -F
20150727:22:38:04:002478 gprecoverseg:gpmaster:gpadmin-[INFO]:-Starting gprecoverseg with args: -F
20150727:22:38:04:002478 gprecoverseg:gpmaster:gpadmin-[INFO]:-local Greenplum Version: ‘postgres (Greenplum Database) 4.3.4.1 build 2‘
20150727:22:38:04:002478 gprecoverseg:gpmaster:gpadmin-[INFO]:-master Greenplum Version: ‘PostgreSQL 8.2.15 (Greenplum Database 4.3.4.1 build 2) on x86_64-unknown-linux-gnu, compiled by GCC gcc (GCC) 4.4.2 compiled on Feb 10 2015 14:15:10‘
20150727:22:38:04:002478 gprecoverseg:gpmaster:gpadmin-[INFO]:-Checking if segments are ready
20150727:22:38:04:002478 gprecoverseg:gpmaster:gpadmin-[INFO]:-Obtaining Segment details from master...
20150727:22:38:04:002478 gprecoverseg:gpmaster:gpadmin-[INFO]:-Unable to connect to database. Retrying 1
20150727:22:38:09:002478 gprecoverseg:gpmaster:gpadmin-[INFO]:-Checking if segments are ready
20150727:22:38:09:002478 gprecoverseg:gpmaster:gpadmin-[INFO]:-Obtaining Segment details from master...
20150727:22:38:09:002478 gprecoverseg:gpmaster:gpadmin-[INFO]:-Unable to connect to database. Retrying 2
20150727:22:38:14:002478 gprecoverseg:gpmaster:gpadmin-[INFO]:-Checking if segments are ready
20150727:22:38:14:002478 gprecoverseg:gpmaster:gpadmin-[INFO]:-Obtaining Segment details from master...
20150727:22:38:15:002478 gprecoverseg:gpmaster:gpadmin-[INFO]:-Unable to connect to database. Retrying 3
20150727:22:38:20:002478 gprecoverseg:gpmaster:gpadmin-[INFO]:-Checking if segments are ready
20150727:22:38:20:002478 gprecoverseg:gpmaster:gpadmin-[INFO]:-Obtaining Segment details from master...
20150727:22:38:20:002478 gprecoverseg:gpmaster:gpadmin-[INFO]:-Unable to connect to database. Retrying 4
20150727:22:38:25:002478 gprecoverseg:gpmaster:gpadmin-[INFO]:-Checking if segments are ready
20150727:22:38:25:002478 gprecoverseg:gpmaster:gpadmin-[INFO]:-Obtaining Segment details from master...
20150727:22:38:25:002478 gprecoverseg:gpmaster:gpadmin-[INFO]:-Unable to connect to database. Retrying 5
20150727:22:38:30:002478 gprecoverseg:gpmaster:gpadmin-[CRITICAL]:-gprecoverseg failed. (Reason=‘Unable to connect to database and start transaction‘) exiting...
--以下为相关节点的配置。
template1=# select * from gp_segment_configuration ;
dbid | content | role | preferred_role | mode | status | port | hostname | address | replication_port | san_mounts
------+---------+------+----------------+------+--------+-------+-----------+-----------+------------------+------------
1 | -1 | p | p | s | u | 5432 | gpmaster | gpmaster | |
2 | 0 | p | p | c | u | 40000 | gpslave-1 | gpslave-1 | 41000 |
4 | 0 | m | m | s | d | 50000 | gpslave-2 | gpslave-2 | 51000 |
3 | 1 | p | p | s | u | 40000 | gpslave-2 | gpslave-2 | 41000 |
5 | 1 | m | m | s | u | 50000 | gpslave-1 | gpslave-1 | 51000 |
(5 rows)
--在这里将content=1的mirror也停掉,该mirror恢复时应该是没有问题的,将其对比content=0上的mirror,来发现相关问题。
--重新启动数据库。
[gpadmin@gpmaster ~]$ gpstart -a
20150727:22:49:41:002958 gpstart:gpmaster:gpadmin-[INFO]:-Starting gpstart with args: -a
20150727:22:49:41:002958 gpstart:gpmaster:gpadmin-[INFO]:-Gathering information and validating the environment...
20150727:22:49:41:002958 gpstart:gpmaster:gpadmin-[INFO]:-Greenplum Binary Version: ‘postgres (Greenplum Database) 4.3.4.1 build 2‘
20150727:22:49:42:002958 gpstart:gpmaster:gpadmin-[INFO]:-Greenplum Catalog Version: ‘201310150‘
20150727:22:49:42:002958 gpstart:gpmaster:gpadmin-[INFO]:-Starting Master instance in admin mode
20150727:22:49:43:002958 gpstart:gpmaster:gpadmin-[INFO]:-Obtaining Greenplum Master catalog information
20150727:22:49:43:002958 gpstart:gpmaster:gpadmin-[INFO]:-Obtaining Segment details from master...
20150727:22:49:43:002958 gpstart:gpmaster:gpadmin-[INFO]:-Setting new master era
20150727:22:49:43:002958 gpstart:gpmaster:gpadmin-[INFO]:-Master Started...
20150727:22:49:43:002958 gpstart:gpmaster:gpadmin-[INFO]:-Shutting down master
20150727:22:49:44:002958 gpstart:gpmaster:gpadmin-[WARNING]:-Skipping startup of segment marked down in configuration: on gpslave-2 directory /data/mirror/gpseg0 <<<<<
20150727:22:49:44:002958 gpstart:gpmaster:gpadmin-[WARNING]:-Skipping startup of segment marked down in configuration: on gpslave-1 directory /data/mirror/gpseg1 <<<<<
20150727:22:49:44:002958 gpstart:gpmaster:gpadmin-[INFO]:-Commencing parallel primary and mirror segment instance startup, please wait.....
20150727:22:49:46:002958 gpstart:gpmaster:gpadmin-[INFO]:-Process results...
20150727:22:49:46:002958 gpstart:gpmaster:gpadmin-[INFO]:-----------------------------------------------------
20150727:22:49:46:002958 gpstart:gpmaster:gpadmin-[INFO]:- Successful segment starts = 2
20150727:22:49:46:002958 gpstart:gpmaster:gpadmin-[INFO]:- Failed segment starts = 0
20150727:22:49:46:002958 gpstart:gpmaster:gpadmin-[WARNING]:-Skipped segment starts (segments are marked down in configuration) = 2 <<<<<<<<
20150727:22:49:46:002958 gpstart:gpmaster:gpadmin-[INFO]:-----------------------------------------------------
20150727:22:49:46:002958 gpstart:gpmaster:gpadmin-[INFO]:-
20150727:22:49:46:002958 gpstart:gpmaster:gpadmin-[INFO]:-Successfully started 2 of 2 segment instances, skipped 2 other segments
20150727:22:49:46:002958 gpstart:gpmaster:gpadmin-[INFO]:-----------------------------------------------------
20150727:22:49:46:002958 gpstart:gpmaster:gpadmin-[WARNING]:-****************************************************************************
20150727:22:49:46:002958 gpstart:gpmaster:gpadmin-[WARNING]:-There are 2 segment(s) marked down in the database
20150727:22:49:46:002958 gpstart:gpmaster:gpadmin-[WARNING]:-To recover from this current state, review usage of the gprecoverseg
20150727:22:49:46:002958 gpstart:gpmaster:gpadmin-[WARNING]:-management utility which will recover failed segment instance databases.
20150727:22:49:46:002958 gpstart:gpmaster:gpadmin-[WARNING]:-****************************************************************************
20150727:22:49:46:002958 gpstart:gpmaster:gpadmin-[INFO]:-Starting Master instance gpmaster directory /data/master/gpseg-1
20150727:22:49:47:002958 gpstart:gpmaster:gpadmin-[INFO]:-Command pg_ctl reports Master gpmaster instance active
20150727:22:49:47:002958 gpstart:gpmaster:gpadmin-[INFO]:-No standby master configured. skipping...
20150727:22:49:47:002958 gpstart:gpmaster:gpadmin-[WARNING]:-Number of segments not attempted to start: 2
20150727:22:49:47:002958 gpstart:gpmaster:gpadmin-[INFO]:-Check status of database with gpstate utility
--现在的配置信息。
template1=# select * from gp_segment_configuration ;
dbid | content | role | preferred_role | mode | status | port | hostname | address | replication_port | san_mounts
------+---------+------+----------------+------+--------+-------+-----------+-----------+------------------+------------
1 | -1 | p | p | s | u | 5432 | gpmaster | gpmaster | |
2 | 0 | p | p | c | u | 40000 | gpslave-1 | gpslave-1 | 41000 |
4 | 0 | m | m | s | d | 50000 | gpslave-2 | gpslave-2 | 51000 |
3 | 1 | p | p | c | u | 40000 | gpslave-2 | gpslave-2 | 41000 |
5 | 1 | m | m | s | d | 50000 | gpslave-1 | gpslave-1 | 51000 |
(5 rows)
--执行全量恢复,并查看相关的详细信息。
[gpadmin@gpmaster ~]$ gprecoverseg -F -v
20150727:22:51:32:003078 gprecoverseg:gpmaster:gpadmin-[INFO]:-Starting gprecoverseg with args: -F -v
20150727:22:51:32:003078 gprecoverseg:gpmaster:gpadmin-[DEBUG]:-Checking if MASTER_DATA_DIRECTORY env variable is set.
20150727:22:51:32:003078 gprecoverseg:gpmaster:gpadmin-[DEBUG]:-Obtaining master‘s port from master data directory
20150727:22:51:32:003078 gprecoverseg:gpmaster:gpadmin-[DEBUG]:-Read from postgresql.conf port=5432
20150727:22:51:32:003078 gprecoverseg:gpmaster:gpadmin-[DEBUG]:-Read from postgresql.conf max_connections=200
20150727:22:51:32:003078 gprecoverseg:gpmaster:gpadmin-[DEBUG]:-Checking if GPHOME env variable is set.
20150727:22:51:32:003078 gprecoverseg:gpmaster:gpadmin-[INFO]:-local Greenplum Version: ‘postgres (Greenplum Database) 4.3.4.1 build 2‘
20150727:22:51:32:003078 gprecoverseg:gpmaster:gpadmin-[DEBUG]:-Connecting to dbname=template1
20150727:22:51:32:003078 gprecoverseg:gpmaster:gpadmin-[INFO]:-master Greenplum Version: ‘PostgreSQL 8.2.15 (Greenplum Database 4.3.4.1 build 2) on x86_64-unknown-linux-gnu, compiled by GCC gcc (GCC) 4.4.2 compiled on Feb 10 2015 14:15:10‘
20150727:22:51:32:003078 gprecoverseg:gpmaster:gpadmin-[INFO]:-Checking if segments are ready
20150727:22:51:32:003078 gprecoverseg:gpmaster:gpadmin-[INFO]:-Obtaining Segment details from master...
20150727:22:51:32:003078 gprecoverseg:gpmaster:gpadmin-[DEBUG]:-Connecting to dbname=template1
20150727:22:51:32:003078 gprecoverseg:gpmaster:gpadmin-[DEBUG]:-Validating configuration...
20150727:22:51:32:003078 gprecoverseg:gpmaster:gpadmin-[DEBUG]:-Sending msg getStatus and cmdStr $GPHOME/bin/gp_primarymirror -h gpslave-1 -p 40000
20150727:22:51:32:003078 gprecoverseg:gpmaster:gpadmin-[DEBUG]:-Adding cmd to work_queue: $GPHOME/bin/gp_primarymirror -h gpslave-1 -p 40000
20150727:22:51:32:003078 gprecoverseg:gpmaster:gpadmin-[DEBUG]:-Sending msg getStatus and cmdStr $GPHOME/bin/gp_primarymirror -h gpslave-2 -p 40000
20150727:22:51:32:003078 gprecoverseg:gpmaster:gpadmin-[DEBUG]:-Adding cmd to work_queue: $GPHOME/bin/gp_primarymirror -h gpslave-2 -p 40000
20150727:22:51:32:003078 gprecoverseg:gpmaster:gpadmin-[DEBUG]:-[worker0] got cmd: $GPHOME/bin/gp_primarymirror -h gpslave-1 -p 40000
20150727:22:51:32:003078 gprecoverseg:gpmaster:gpadmin-[DEBUG]:-[worker1] got cmd: $GPHOME/bin/gp_primarymirror -h gpslave-2 -p 40000
20150727:22:51:32:003078 gprecoverseg:gpmaster:gpadmin-[DEBUG]:-[worker0] finished cmd: Get segment status cmdStr=‘ssh -o ‘StrictHostKeyChecking no‘ gpslave-1 ". /usr/local/greenplum-db/./greenplum_path.sh; $GPHOME/bin/gp_primarymirror -h gpslave-1 -p 40000"‘
had result: cmd had rc=1 completed=True halted=False
stdout=‘‘
stderr=‘mode: PrimarySegment
segmentState: ChangeTrackingDisabled--content=0的primary的状态是ChangeTrackingDisabled
dataState: InChangeTracking
faultType: NotInitialized
mode: PrimarySegment
segmentState: ChangeTrackingDisabled
dataState: InChangeTracking
faultType: NotInitialized
‘
20150727:22:51:32:003078 gprecoverseg:gpmaster:gpadmin-[DEBUG]:-[worker1] finished cmd: Get segment status cmdStr=‘ssh -o ‘StrictHostKeyChecking no‘ gpslave-2 ". /usr/local/greenplum-db/./greenplum_path.sh; $GPHOME/bin/gp_primarymirror -h gpslave-2 -p 40000"‘
had result: cmd had rc=1 completed=True halted=False
stdout=‘‘
stderr=‘mode: PrimarySegment
segmentState: Ready --content=1的primary的状态是Ready
dataState: InChangeTracking
faultType: NotInitialized
mode: PrimarySegment
segmentState: Ready
dataState: InChangeTracking
faultType: NotInitialized
‘
20150727:22:51:32:003078 gprecoverseg:gpmaster:gpadmin-[DEBUG]:-Encountered error Not ready to connect to database mode: PrimarySegment
segmentState: ChangeTrackingDisabled --由此可见,之前无法执行全量恢复是由于content=0的primary有问题,与mirror无关
dataState: InChangeTracking
faultType: NotInitialized
mode: PrimarySegment
segmentState: ChangeTrackingDisabled
dataState: InChangeTracking
faultType: NotInitialized
--查看content=0的primary此次启动的日志记录,与下面content=1的primary此次启动的日志记录对比,发现还是CHANGETRACKING有问题。而且gprecoverseg -F执行后没有产生日志记录,说明相应文件只在数据库启动时生效。
[gpadmin@gpslave-1 pg_log]$ cat gpdb-2015-07-27_224945.csv
2015-07-27 22:49:46.228113 CST,"gpadmin","postgres",p2367,th2061358880,"[local]",,2015-07-27 22:49:46 CST,0,,,seg-1,,,,,"FATAL","57M01","the database system is in mirror or uninitialized mode",,,,,,,0,,"postmaster.c",2994,
2015-07-27 22:49:46.248685 CST,,,p2370,th2061358880,,,,0,,,seg-1,,,,,"LOG","00000","mirror transition, primary address(port) ‘gpslave-1(41000)‘ mirror address(port) ‘gpslave-2(51000)‘",,,,,"mirroring role ‘primary role‘ mirroring state ‘change tracking‘ segment
state ‘not initialized‘ process name(pid) ‘filerep main process(2370)‘ filerep state ‘not initialized‘ ",,0,,"cdbfilerep.c",3472,
2015-07-27 22:49:46.252539 CST,,,p2371,th2061358880,,,,0,,,seg-1,,,,,"LOG","00000","CHANGETRACKING: ChangeTracking_RetrieveIsTransitionToInsync() found insync_transition_completed:‘false‘ full resync:‘true‘",,,,,,,0,,"cdbresynchronizechangetracking.c",2545,
2015-07-27 22:49:46.252589 CST,,,p2371,th2061358880,,,,0,,,seg-1,,,,,"LOG","00000","CHANGETRACKING: ChangeTracking_RetrieveIsTransitionToResync() found resync_transition_completed:‘false‘ full resync:‘true‘",,,,,,,0,,"cdbresynchronizechangetracking.c",2582,
2015-07-27 22:49:46.254921 CST,,,p2372,th2061358880,,,,0,,,seg-1,,,,,"LOG","00000","database system was shut down at 2015-07-27 22:46:49 CST",,,,,,,0,,"xlog.c",6372,
2015-07-27 22:49:46.255156 CST,,,p2372,th2061358880,,,,0,,,seg-1,,,,,"LOG","00000","checkpoint record is at 0/8C75AD8",,,,,,,0,,"xlog.c",6560,
2015-07-27 22:49:46.255194 CST,,,p2372,th2061358880,,,,0,,,seg-1,,,,,"LOG","00000","redo record is at 0/8C75AD8; undo record is at 0/0; shutdown TRUE",,,,,,,0,,"xlog.c",6617,
2015-07-27 22:49:46.255219 CST,,,p2372,th2061358880,,,,0,,,seg-1,,,,,"LOG","00000","next transaction ID: 0/3051; next OID: 43252",,,,,,,0,,"xlog.c",6621,
2015-07-27 22:49:46.255241 CST,,,p2372,th2061358880,,,,0,,,seg-1,,,,,"LOG","00000","next MultiXactId: 1; next MultiXactOffset: 0",,,,,,,0,,"xlog.c",6624,
2015-07-27 22:49:46.255280 CST,,,p2372,th2061358880,,,,0,,,seg-1,,,,,"LOG","00000","end of transaction log location is 0/8C75B70",,,,,,,0,,"xlog.c",6988,
2015-07-27 22:49:46.255863 CST,,,p2372,th2061358880,,,,0,,,seg-1,,,,,"LOG","00000","Oldest active transaction from prepared transactions 3051",,,,,,,0,,"xlog.c",6033,
2015-07-27 22:49:46.292261 CST,,,p2372,th2061358880,,,,0,,,seg-1,,,,,"LOG","00000","database system is ready",,,,,,,0,,"xlog.c",6084,
2015-07-27 22:49:46.292308 CST,,,p2372,th2061358880,,,,0,,,seg-1,,,,,"LOG","00000","PostgreSQL 8.2.15 (Greenplum Database 4.3.4.1 build 2) on x86_64-unknown-linux-gnu, compiled by GCC gcc (GCC) 4.4.2 compiled on Feb 10 2015 14:10:54",,,,,,,0,,"xlog.c",6095,
2015-07-27 22:49:46.294384 CST,,,p2372,th2061358880,,,,0,,,seg-1,,,,,"LOG","00000","Finished normal startup for clean shutdown case",,,,,,,0,,"xlog.c",7202,
2015-07-27 22:49:46.300763 CST,,,p2373,th2061358880,,,,0,,,seg-1,,,,,"LOG","00000","Finished BASIC startup integrity checking",,,,,,,0,,"xlog.c",7647,
2015-07-27 22:49:46.303294 CST,,,p2365,th2061358880,,,,0,,,seg-1,,,,,"LOG","00000","PostgreSQL 8.2.15 (Greenplum Database 4.3.4.1 build 2) on x86_64-unknown-linux-gnu, compiled by GCC gcc (GCC) 4.4.2 compiled on Feb 10 2015 14:13:47",,,,,,,0,,"postmaster.c",4640,
2015-07-27 22:49:46.303322 CST,,,p2365,th2061358880,,,,0,,,seg-1,,,,,"LOG","00000","database system is ready to accept connections","PostgreSQL 8.2.15 (Greenplum Database 4.3.4.1 build 2) on x86_64-unknown-linux-gnu, compiled by GCC gcc (GCC) 4.4.2 compiled
on Feb 10 2015 14:13:47",,,,,,0,,"postmaster.c",4647,
2015-07-27 22:49:47.847981 CST,"gpadmin","template1",p2381,th2061358880,"192.168.22.110","5303",2015-07-27 22:49:47 CST,3054,con5,,seg0,,,x3054,sx1,"LOG","00000","gp_max_distributed_xid returning 1",,,,,,"select gp_max_distributed_xid()",0,,"cdbmaxdistributedxid.c",29,
[gpadmin@gpslave-2 pg_log]$ cat gpdb-2015-07-27_224944.csv
2015-07-27 22:49:45.707976 CST,"gpadmin","postgres",p2302,th-240429280,"[local]",,2015-07-27 22:49:45 CST,0,,,seg-1,,,,,"FATAL","57M01","the database system is in mirror or uninitialized mode",,,,,,,0,,"postmaster.c",2994,
2015-07-27 22:49:45.726077 CST,,,p2305,th-240429280,,,,0,,,seg-1,,,,,"LOG","00000","mirror transition, primary address(port) ‘gpslave-2(41000)‘ mirror address(port) ‘gpslave-1(51000)‘",,,,,"mirroring role ‘primary role‘ mirroring state ‘change tracking‘ segment
state ‘not initialized‘ process name(pid) ‘filerep main process(2305)‘ filerep state ‘not initialized‘ ",,0,,"cdbfilerep.c",3472,
2015-07-27 22:49:45.735230 CST,,,p2306,th-240429280,,,,0,,,seg-1,,,,,"LOG","00000","last checkpoint location for generating initial changetracking log 0/CC12DD8",,,,,,,0,,"xlog.c",11560,
2015-07-27 22:49:45.735376 CST,,,p2306,th-240429280,,,,0,,,seg-1,,,,,"WARNING","XX000","unable to read change tracking ‘full log‘ file : Success (cdbresynchronizechangetracking.c:3382)",,,,,,,0,,"cdbresynchronizechangetracking.c",3382,
2015-07-27 22:49:45.735423 CST,,,p2306,th-240429280,,,,0,,,seg-1,,,,,"LOG","00000","last changetracked location in changetracking full log 0/0",,,,,,,0,,"xlog.c",11572,
2015-07-27 22:49:45.735475 CST,,,p2306,th-240429280,,,,0,,,seg-1,,,,,"LOG","00000","record with zero length at 0/CC12E70",,,,,,,0,,"xlog.c",4155,
2015-07-27 22:49:45.735505 CST,,,p2306,th-240429280,,,,0,,,seg-1,,,,,"LOG","00000","scanned through 1 initial xlog records since last checkpoint for writing into the resynchronize change log",,,,,,,0,,"cdbresynchronizechangetracking.c",206,
2015-07-27 22:49:45.736987 CST,,,p2307,th-240429280,,,,0,,,seg-1,,,,,"LOG","00000","database system was shut down at 2015-07-27 22:46:49 CST",,,,,,,0,,"xlog.c",6372,
2015-07-27 22:49:45.737260 CST,,,p2307,th-240429280,,,,0,,,seg-1,,,,,"LOG","00000","checkpoint record is at 0/CC12DD8",,,,,,,0,,"xlog.c",6560,
2015-07-27 22:49:45.737298 CST,,,p2307,th-240429280,,,,0,,,seg-1,,,,,"LOG","00000","redo record is at 0/CC12DD8; undo record is at 0/0; shutdown TRUE",,,,,,,0,,"xlog.c",6617,
2015-07-27 22:49:45.737322 CST,,,p2307,th-240429280,,,,0,,,seg-1,,,,,"LOG","00000","next transaction ID: 0/3006; next OID: 43252",,,,,,,0,,"xlog.c",6621,
2015-07-27 22:49:45.737343 CST,,,p2307,th-240429280,,,,0,,,seg-1,,,,,"LOG","00000","next MultiXactId: 1; next MultiXactOffset: 0",,,,,,,0,,"xlog.c",6624,
2015-07-27 22:49:45.737382 CST,,,p2307,th-240429280,,,,0,,,seg-1,,,,,"LOG","00000","end of transaction log location is 0/CC12E70",,,,,,,0,,"xlog.c",6988,
2015-07-27 22:49:45.738000 CST,,,p2307,th-240429280,,,,0,,,seg-1,,,,,"LOG","00000","Oldest active transaction from prepared transactions 3006",,,,,,,0,,"xlog.c",6033,
2015-07-27 22:49:45.743458 CST,,,p2307,th-240429280,,,,0,,,seg-1,,,,,"LOG","00000","database system is ready",,,,,,,0,,"xlog.c",6084,
2015-07-27 22:49:45.743497 CST,,,p2307,th-240429280,,,,0,,,seg-1,,,,,"LOG","00000","PostgreSQL 8.2.15 (Greenplum Database 4.3.4.1 build 2) on x86_64-unknown-linux-gnu, compiled by GCC gcc (GCC) 4.4.2 compiled on Feb 10 2015 14:10:54",,,,,,,0,,"xlog.c",6095,
2015-07-27 22:49:45.745424 CST,,,p2307,th-240429280,,,,0,,,seg-1,,,,,"LOG","00000","Finished normal startup for clean shutdown case",,,,,,,0,,"xlog.c",7202,
2015-07-27 22:49:45.751648 CST,,,p2308,th-240429280,,,,0,,,seg-1,,,,,"LOG","00000","Finished BASIC startup integrity checking",,,,,,,0,,"xlog.c",7647,
2015-07-27 22:49:45.754195 CST,,,p2300,th-240429280,,,,0,,,seg-1,,,,,"LOG","00000","PostgreSQL 8.2.15 (Greenplum Database 4.3.4.1 build 2) on x86_64-unknown-linux-gnu, compiled by GCC gcc (GCC) 4.4.2 compiled on Feb 10 2015 14:13:47",,,,,,,0,,"postmaster.c",4640,
2015-07-27 22:49:45.754222 CST,,,p2300,th-240429280,,,,0,,,seg-1,,,,,"LOG","00000","database system is ready to accept connections","PostgreSQL 8.2.15 (Greenplum Database 4.3.4.1 build 2) on x86_64-unknown-linux-gnu, compiled by GCC gcc (GCC) 4.4.2 compiled
on Feb 10 2015 14:13:47",,,,,,0,,"postmaster.c",4647,
2015-07-27 22:49:47.296961 CST,"gpadmin","template1",p2316,th-240429280,"192.168.22.110","39217",2015-07-27 22:49:47 CST,3009,con5,,seg1,,,x3009,sx1,"LOG","00000","gp_max_distributed_xid returning 1",,,,,,"select gp_max_distributed_xid()",0,,"cdbmaxdistributedxid.c",29,
Continue with segment recovery procedure Yy|Nn (default=N):
> y
20150727:23:30:34:004102 gprecoverseg:gpmaster:gpadmin-[INFO]:-2 segment(s) to recover
20150727:23:30:34:004102 gprecoverseg:gpmaster:gpadmin-[INFO]:-Ensuring 2 failed segment(s) are stopped
20150727:23:30:36:004102 gprecoverseg:gpmaster:gpadmin-[INFO]:-Ensuring that shared memory is cleaned up for stopped segments
20150727:23:30:41:004102 gprecoverseg:gpmaster:gpadmin-[INFO]:-Cleaning files from 2 segment(s)
..
20150727:23:30:43:004102 gprecoverseg:gpmaster:gpadmin-[INFO]:-Building template directory
20150727:23:30:46:004102 gprecoverseg:gpmaster:gpadmin-[INFO]:-Validating remote directories
.
20150727:23:30:47:004102 gprecoverseg:gpmaster:gpadmin-[INFO]:-Copying template directory file
.
20150727:23:30:48:004102 gprecoverseg:gpmaster:gpadmin-[INFO]:-Configuring new segments
.
20150727:23:30:49:004102 gprecoverseg:gpmaster:gpadmin-[INFO]:-Cleaning files
.
20150727:23:30:50:004102 gprecoverseg:gpmaster:gpadmin-[INFO]:-Starting file move procedure for gpslave-2:/data/mirror/gpseg0:content=0:dbid=4:mode=s:status=d
20150727:23:30:50:004102 gprecoverseg:gpmaster:gpadmin-[INFO]:-Starting file move procedure for gpslave-1:/data/mirror/gpseg1:content=1:dbid=5:mode=s:status=d
updating flat files
20150727:23:30:50:004102 gprecoverseg:gpmaster:gpadmin-[INFO]:-Updating configuration with new mirrors
20150727:23:30:50:004102 gprecoverseg:gpmaster:gpadmin-[INFO]:-Updating mirrors
.
20150727:23:30:51:004102 gprecoverseg:gpmaster:gpadmin-[INFO]:-Starting mirrors
20150727:23:30:51:004102 gprecoverseg:gpmaster:gpadmin-[INFO]:-Commencing parallel primary and mirror segment instance startup, please wait...
............................................................................
20150727:23:32:10:004102 gprecoverseg:gpmaster:gpadmin-[INFO]:-Process results...
20150727:23:32:10:004102 gprecoverseg:gpmaster:gpadmin-[INFO]:-Updating configuration to mark mirrors up
20150727:23:32:10:004102 gprecoverseg:gpmaster:gpadmin-[INFO]:-Updating primaries
20150727:23:32:10:004102 gprecoverseg:gpmaster:gpadmin-[INFO]:-Commencing parallel primary conversion of 2 segments, please wait...
....
20150727:23:32:15:004102 gprecoverseg:gpmaster:gpadmin-[INFO]:-Process results...
20150727:23:32:15:004102 gprecoverseg:gpmaster:gpadmin-[INFO]:-Done updating primaries
20150727:23:32:15:004102 gprecoverseg:gpmaster:gpadmin-[INFO]:-******************************************************************
20150727:23:32:15:004102 gprecoverseg:gpmaster:gpadmin-[INFO]:-Updating segments for resynchronization is completed.
20150727:23:32:15:004102 gprecoverseg:gpmaster:gpadmin-[INFO]:-For segments updated successfully, resynchronization will continue in the background.
20150727:23:32:15:004102 gprecoverseg:gpmaster:gpadmin-[INFO]:-
20150727:23:32:15:004102 gprecoverseg:gpmaster:gpadmin-[INFO]:-Use gpstate -s to check the resynchronization progress.
20150727:23:32:15:004102 gprecoverseg:gpmaster:gpadmin-[INFO]:-******************************************************************
[gpadmin@gpmaster ~]$ gpstate -e
20150727:23:34:45:005069 gpstate:gpmaster:gpadmin-[INFO]:-Starting gpstate with args: -e
20150727:23:34:45:005069 gpstate:gpmaster:gpadmin-[INFO]:-local Greenplum Version: ‘postgres (Greenplum Database) 4.3.4.1 build 2‘
20150727:23:34:45:005069 gpstate:gpmaster:gpadmin-[INFO]:-master Greenplum Version: ‘PostgreSQL 8.2.15 (Greenplum Database 4.3.4.1 build 2) on x86_64-unknown-linux-gnu, compiled by GCC gcc (GCC) 4.4.2 compiled on Feb 10 2015 14:15:10‘
20150727:23:34:45:005069 gpstate:gpmaster:gpadmin-[INFO]:-Obtaining Segment details from master...
20150727:23:34:45:005069 gpstate:gpmaster:gpadmin-[INFO]:-Gathering data from segments...
..
20150727:23:34:47:005069 gpstate:gpmaster:gpadmin-[INFO]:-----------------------------------------------------
20150727:23:34:47:005069 gpstate:gpmaster:gpadmin-[INFO]:-Segment Mirroring Status Report
20150727:23:34:47:005069 gpstate:gpmaster:gpadmin-[INFO]:-----------------------------------------------------
20150727:23:34:47:005069 gpstate:gpmaster:gpadmin-[INFO]:-All segments are running normally
--总结一下解决方法:当全量恢复gprecoverseg -F无法正常运行,且报错信息为Unable to connect to database. Retrying 1
gprecoverseg failed. (Reason=‘Unable to connect to database and start transaction‘) exiting...时,
从其他正常的主事例上拷贝一份data/primary/gpseg*/pg_changetracking下的CT_METADATA到出问题的主事例上,覆盖原CT_METADATA就可以正常进行全量恢复了。