标签:应该 stmp 损坏 soc 磁盘分区 dai 分享 dmi fun
有朋友反馈,他们做了xx存储的双活之后,重启主机发现gi无法正常启动,分析发现所有该存储的磁盘分区信息丢失,导致asmlib无法发现磁盘(使用分区做asm disk)
类似如下错误(磁盘分区丢失)
--fdisk -l 显示部分结果Disk /dev/mapper/datahds1: 1099.5 GB, 1099511627776 bytes255 heads, 63 sectors/track, 133674 cylindersUnits = cylinders of 16065 * 512 = 8225280 bytesSector size (logical/physical): 512 bytes / 512 bytesI/O size (minimum/optimal): 512 bytes / 512 bytesDisk identifier: 0x00000000--ls -l /dev/mapper/ 显示结果无分区信息lrwxrwxrwx 1 root root 7 May 6 03:44 datahds1 -> ../dm-1lrwxrwxrwx 1 root root 7 May 6 03:26 datahds2 -> ../dm-3lrwxrwxrwx 1 root root 7 May 6 03:26 datahds3 -> ../dm-8lrwxrwxrwx 1 root root 7 May 6 03:26 ocrhds1 -> ../dm-0lrwxrwxrwx 1 root root 7 May 6 03:26 ocrhds2 -> ../dm-2lrwxrwxrwx 1 root root 7 May 6 03:26 ocrhds3 -> ../dm-4 |
asm日志显示
SUCCESS: diskgroup DATADG was mountedNOTE: Instance updated compatible.asm to 11.2.0.0.0 for grp 3SUCCESS: diskgroup OCRHDS was mountedORA-15032: not all alterations performedORA-15017: diskgroup "DATA" cannot be mountedORA-15063: ASM discovered an insufficient number of disks for diskgroup "DATA" |
分析系统日志
May 6 02:23:27 db2 kernel: sdb: unknown partition tableMay 6 02:23:27 db2 kernel: sde: unknown partition tableMay 6 02:23:27 db2 kernel: sdc: unknown partition tableMay 6 02:23:27 db2 kernel: sdf: unknown partition tableMay 6 02:23:27 db2 kernel: sdd: unknown partition tableMay 6 02:23:27 db2 kernel: sdj:Dev sdj: unable to read RDB block 0May 6 02:23:27 db2 kernel: unable to read partition tableMay 6 02:23:27 db2 kernel: sdi: sdi1May 6 02:23:27 db2 kernel: sdk: sdk1May 6 02:23:27 db2 kernel: sdg: unknown partition tableMay 6 02:23:27 db2 kernel: sdl: sdl1May 6 02:23:27 db2 kernel: sdm:Dev sdm: unable to read RDB block 0May 6 02:23:27 db2 kernel: unable to read partition tableMay 6 02:23:27 db2 kernel: sdo:Dev sdo: unable to read RDB block 0May 6 02:23:27 db2 kernel: unable to read partition tableMay 6 02:23:27 db2 kernel: sdn:Dev sdn: unable to read RDB block 0May 6 02:23:27 db2 kernel: unable to read partition tableMay 6 02:23:27 db2 kernel: sdp:Dev sdp: unable to read RDB block 0May 6 02:23:27 db2 kernel: unable to read partition tableMay 6 02:23:27 db2 kernel: sds:Dev sds: unable to read RDB block 0May 6 02:23:27 db2 kernel: unable to read partition tableMay 6 02:23:27 db2 kernel: sdh:May 6 02:23:27 db2 kernel: sdt: sdt1May 6 02:23:27 db2 kernel: sdv:Dev sdv: unable to read RDB block 0May 6 02:23:27 db2 kernel: unable to read partition tableMay 6 02:23:27 db2 kernel: sdq:Dev sdq: unable to read RDB block 0May 6 02:23:27 db2 kernel: unable to read partition tableMay 6 02:23:27 db2 kernel: sd 1:0:1:9: [sdr] Very big device. Trying to use READ CAPACITY(16).May 6 02:23:27 db2 kernel: sdr:Dev sdr: unable to read RDB block 0May 6 02:23:27 db2 kernel: unable to read partition tableMay 6 02:23:27 db2 kernel: sd 2:0:0:9: [sdab] Very big device. Trying to use READ CAPACITY(16).May 6 02:23:27 db2 kernel: sdab: unknown partition tableMay 6 02:23:27 db2 kernel: sdac: unknown partition tableMay 6 02:23:27 db2 kernel: sdw: sdw1May 6 02:23:27 db2 kernel: sdu:Dev sdu: unable to read RDB block 0May 6 02:23:27 db2 kernel: unable to read partition tableMay 6 02:23:27 db2 kernel: sdx: sdx1May 6 02:23:27 db2 kernel: sdy: sdy1May 6 02:23:27 db2 kernel: sdaa: sdaa1May 6 02:23:27 db2 kernel: sdz: sdz1May 6 02:23:27 db2 kernel: sdae: unknown partition tableMay 6 02:23:27 db2 kernel: sdaf: unknown partition tableMay 6 02:23:27 db2 kernel: sdag: unknown partition tableMay 6 02:23:27 db2 kernel: sdai:May 6 02:23:27 db2 kernel: sdah: unknown partition tableMay 6 02:23:27 db2 kernel: sdad: unknown partition tableMay 6 02:23:28 db2 mcelog: failed to prefill DIMM database from DMI data |
这里错误比较明显unknown partition table,磁盘的分区信息损坏.使用fdisk无法发现分区
partprobe也无效
[root@db2 oracle]# partprobe /dev/mapper/ocrhds3[root@db2 oracle]#[root@db2 oracle]# ls -l /dev/mapper/ocrhds3*lrwxrwxrwx 1 root root 7 May 6 07:30 /dev/mapper/ocrhds3 -> ../dm-4 |
从尚需信息看,磁盘的分区表信息应该已经损坏,现在能够做的,就是希望运气好,磁盘的分区的实际数据没有损坏
分析磁盘实际分区数据
[root@db2 ~]$ dd if=/dev/mapper/datahds1 of=/tmp/datahds1.dd bs=1024k count=50[root@db2 ~]$ dd if=/tmp/datahds1.dd of=/tmp/xff01.dd bs=3225 skip=1[grid@db2 ~]$ kfed read /tmp/xff01.dd |morekfbh.endian: 1 ; 0x000: 0x01kfbh.hard: 130 ; 0x001: 0x82kfbh.type: 1 ; 0x002: KFBTYP_DISKHEADkfbh.datfmt: 1 ; 0x003: 0x01kfbh.block.blk: 0 ; 0x004: blk=0kfbh.block.obj: 2147483648 ; 0x008: disk=0kfbh.check: 3110278718 ; 0x00c: 0xb963163ekfbh.fcn.base: 0 ; 0x010: 0x00000000kfbh.fcn.wrap: 0 ; 0x014: 0x00000000kfbh.spare1: 0 ; 0x018: 0x00000000kfbh.spare2: 0 ; 0x01c: 0x00000000kfdhdb.driver.provstr: ORCLDISKHDSDATA1 ; 0x000: length=16kfdhdb.driver.reserved[0]: 1146307656 ; 0x008: 0x44534448kfdhdb.driver.reserved[1]: 826364993 ; 0x00c: 0x31415441kfdhdb.driver.reserved[2]: 0 ; 0x010: 0x00000000kfdhdb.driver.reserved[3]: 0 ; 0x014: 0x00000000kfdhdb.driver.reserved[4]: 0 ; 0x018: 0x00000000kfdhdb.driver.reserved[5]: 0 ; 0x01c: 0x00000000kfdhdb.compat: 186646528 ; 0x020: 0x0b200000kfdhdb.dsknum: 0 ; 0x024: 0x0000kfdhdb.grptyp: 1 ; 0x026: KFDGTP_EXTERNALkfdhdb.hdrsts: 3 ; 0x027: KFDHDR_MEMBERkfdhdb.dskname: DATADG_0000 ; 0x028: length=11kfdhdb.grpname: DATADG ; 0x048: length=6kfdhdb.fgname: DATADG_0000 ; 0x068: length=11kfdhdb.capname: ; 0x088: length=0kfdhdb.crestmp.hi: 33050696 ; 0x0a8: HOUR=0x8 DAYS=0x2 MNTH=0x4 YEAR=0x7e1kfdhdb.crestmp.lo: 3813740544 ; 0x0ac: USEC=0x0 MSEC=0x44 SECS=0x35 MINS=0x38kfdhdb.mntstmp.hi: 33050701 ; 0x0b0: HOUR=0xd DAYS=0x2 MNTH=0x4 YEAR=0x7e1kfdhdb.mntstmp.lo: 411385856 ; 0x0b4: USEC=0x0 MSEC=0x150 SECS=0x8 MINS=0x6通过上述分析,我们可以初步判断,分区磁盘的信息很可能是好的(因为asm disk header是好的,根据一般的规则从前往后覆盖,既然header是好的,后面的block被覆盖的概率非常小)
通过准备新磁盘直接把磁盘分区dd到新设备上
dd if=/dev/mapper/ocrhds1 of=/dev/mapper/ocrhdsnew1 skip=1 bs=3225dd if=/dev/mapper/ocrhds2 of=/dev/mapper/ocrhdsnew2 skip=1 bs=3225dd if=/dev/mapper/ocrhds3 of=/dev/mapper/ocrhdsnew3 skip=1 bs=3225dd if=/dev/mapper/datahds1 of=/dev/mapper/datahdsnew1 skip=1 bs=3225dd if=/dev/mapper/datahds2 of=/dev/mapper/datahdsnew2 skip=1 bs=3225dd if=/dev/mapper/datahds3 of=/dev/mapper/datahdsnew3 skip=1 bs=3225asmlib重新扫描磁盘
[root@db1 disks]# oracleasm scandisksReloading disk partitions: doneCleaning any stale ASM disks...Scanning system for ASM disks...Instantiating disk "HDSOCR3"Instantiating disk "HDSDATA2"Instantiating disk "HDSDATA1"Instantiating disk "HDSDATA3"Instantiating disk "HDSOCR1"Instantiating disk "HDSOCR2"[root@db1 disks]# ls -ltrtotal 0brw-rw---- 1 grid asmadmin 8, 160 May 6 13:49 HDSOCR3brw-rw---- 1 grid asmadmin 8, 192 May 6 13:49 HDSDATA2brw-rw---- 1 grid asmadmin 8, 176 May 6 13:49 HDSDATA1brw-rw---- 1 grid asmadmin 8, 208 May 6 13:49 HDSDATA3brw-rw---- 1 grid asmadmin 8, 128 May 6 13:49 HDSOCR1brw-rw---- 1 grid asmadmin 8, 144 May 6 13:49 HDSOCR2kfed验证拷贝的分区
[root@db2 tmp]# /oracle/app/11.2.0/grid_1/bin/kfed read /dev/oracleasm/disks/HDSDATA1kfbh.endian: 1 ; 0x000: 0x01kfbh.hard: 130 ; 0x001: 0x82kfbh.type: 1 ; 0x002: KFBTYP_DISKHEADkfbh.datfmt: 1 ; 0x003: 0x01kfbh.block.blk: 0 ; 0x004: blk=0kfbh.block.obj: 2147483648 ; 0x008: disk=0kfbh.check: 3110278718 ; 0x00c: 0xb963163ekfbh.fcn.base: 0 ; 0x010: 0x00000000kfbh.fcn.wrap: 0 ; 0x014: 0x00000000kfbh.spare1: 0 ; 0x018: 0x00000000kfbh.spare2: 0 ; 0x01c: 0x00000000kfdhdb.driver.provstr: ORCLDISKHDSDATA1 ; 0x000: length=16kfdhdb.driver.reserved[0]: 1146307656 ; 0x008: 0x44534448kfdhdb.driver.reserved[1]: 826364993 ; 0x00c: 0x31415441kfdhdb.driver.reserved[2]: 0 ; 0x010: 0x00000000kfdhdb.driver.reserved[3]: 0 ; 0x014: 0x00000000kfdhdb.driver.reserved[4]: 0 ; 0x018: 0x00000000kfdhdb.driver.reserved[5]: 0 ; 0x01c: 0x00000000kfdhdb.compat: 186646528 ; 0x020: 0x0b200000kfdhdb.dsknum: 0 ; 0x024: 0x0000kfdhdb.grptyp: 1 ; 0x026: KFDGTP_EXTERNALkfdhdb.hdrsts: 3 ; 0x027: KFDHDR_MEMBERkfdhdb.dskname: DATADG_0000 ; 0x028: length=11kfdhdb.grpname: DATADG ; 0x048: length=6kfdhdb.fgname: DATADG_0000 ; 0x068: length=11kfdhdb.capname: ; 0x088: length=0asm和数据库启动正常
[grid@db2 ~]$ asmcmdASMCMD> lsdgState Type Rebal Sector Block AU Total_MB Free_MB Req_mir_free_MB Usable_file_MB Offline_disks Voting_files NameMOUNTED EXTERN N 512 4096 1048576 3145710 2378034 0 2378034 0 N DATADG/MOUNTED NORMAL N 512 4096 1048576 15342 14416 5114 4651 0 Y OCRHDS/ASMCMD> [oracle@db2 ~]$ sqlplus / as sysdbaSQL*Plus: Release 11.2.0.4.0 Production on Sat May 6 13:54:21 2017Copyright (c) 1982, 2013, Oracle. All rights reserved.Connected to an idle instance.SQL> startupORACLE instance started.Total System Global Area 3.6077E+10 bytesFixed Size 2260648 bytesVariable Size 7247757656 bytesDatabase Buffers 2.8723E+10 bytesRedo Buffers 104382464 bytesDatabase mounted.Database opened.SQL> |
通过上述恢复,实现asm磁盘分区丢失数据0丢失
如果您遇到此类情况,无法解决请联系我们,提供专业ORACLE数据库恢复技术支持
Phone:13429648788 Q Q:107644445
E-Mail:dba@xifenfei.com
标签:应该 stmp 损坏 soc 磁盘分区 dai 分享 dmi fun
原文地址:https://www.cnblogs.com/xifenfei/p/10023465.html