Oracle数据库用的是ASM存储,加了几个盘之后, 有一个ASM diskgroup突然起不来了

Oracle数据库用的是ASM存储,加了几个盘之后, 有一个ASM diskgroup突然起不来了,第1张

如果在10.2.0.4 以后版本当向ASM

Diskgroup中加入新的磁盘后diskgroup被dismount,尝试mount该diskgroup时报错ORA-15042: ASM

disk is missing after add disk took place,那么可以参考本帖。

Tue Feb 12 17:33:59 2013

NOTE: X->S down convert bast on F1B3 bastCount=2

Wed Feb 13 04:06:38 2013 <ALTER DISKGROUP DG1 ADD DISK

'/dev/mapper/t1_asm03p1',

'/dev/mapper/t1_asm04p1',

'/dev/mapper/t1_asm05p1',

'/dev/mapper/t1_asm06p1'

rebalance power 4

Wed Feb 13 04:06:38 2013

NOTE: reconfiguration of group 1/0x53bffa1 (DG1), full=1

Wed Feb 13 04:06:39 2013

NOTE: initializing header on grp 1 disk DG1_0026

NOTE: initializing header on grp 1 disk DG1_0027

NOTE: initializing header on grp 1 disk DG1_0028

NOTE: initializing header on grp 1 disk DG1_0029

NOTE: cache opening disk 26 of grp 1: DG1_0026 path:/dev/mapper/t1_asm03p1

NOTE: cache opening disk 27 of grp 1: DG1_0027 path:/dev/mapper/t1_asm04p1

NOTE: cache opening disk 28 of grp 1: DG1_0028 path:/dev/mapper/t1_asm05p1

NOTE: cache opening disk 29 of grp 1: DG1_0029 path:/dev/mapper/t1_asm06p1

NOTE: PST update: grp = 1

NOTE: requesting all-instance disk validation for group=1

Wed Feb 13 04:06:39 2013

NOTE: disk validation pending for group 1/0x53bffa1 (DG1)

Wed Feb 13 04:06:40 2013

NOTE: requesting all-instance membership refresh for group=1

Wed Feb 13 04:06:40 2013

NOTE: membership refresh pending for group 1/0x53bffa1 (DG1)

SUCCESS: validated disks for 1/0x53bffa1 (DG1)

SUCCESS: refreshed membership for 1/0x53bffa1 (DG1)

Wed Feb 13 04:07:11 2013 <ALTER DISKGROUP DG1 ADD DISK

'/dev/mapper/t1_asm03p1',

'/dev/mapper/t1_asm04p1',

'/dev/mapper/t1_asm05p1',

'/dev/mapper/t1_asm06p1'

rebalance power 4

NOTE: cache closing disk 26 of grp 1: DG1_0026 path:/dev/mapper/t1_asm03p1

NOTE: cache closing disk 26 of grp 1: DG1_0026 path:/dev/mapper/t1_asm03p1

NOTE: cache closing disk 27 of grp 1: DG1_0027 path:/dev/mapper/t1_asm04p1

NOTE: cache closing disk 27 of grp 1: DG1_0027 path:/dev/mapper/t1_asm04p1

NOTE: cache closing disk 28 of grp 1: DG1_0028 path:/dev/mapper/t1_asm05p1

NOTE: cache closing disk 28 of grp 1: DG1_0028 path:/dev/mapper/t1_asm05p1

NOTE: cache closing disk 29 of grp 1: DG1_0029 path:/dev/mapper/t1_asm06p1

NOTE: cache closing disk 29 of grp 1: DG1_0029 path:/dev/mapper/t1_asm06p1

Wed Feb 13 04:09:36 2013

SQL>ALTER DISKGROUP DG1 ADD DISK

'/dev/mapper/t1_asm03p1',

'/dev/mapper/t1_asm04p1',

'/dev/mapper/t1_asm05p1',

'/dev/mapper/t1_asm06p1'

rebalance power 4

Wed Feb 13 04:09:36 2013

NOTE: reconfiguration of group 1/0x53bffa1 (DG1), full=1

Wed Feb 13 04:09:36 2013

NOTE: requesting all-instance membership refresh for group=1

Wed Feb 13 04:09:36 2013

NOTE: membership refresh pending for group 1/0x53bffa1 (DG1)

SUCCESS: validated disks for 1/0x53bffa1 (DG1)

NOTE: PST update: grp = 1, dsk = 26, mode = 0x4

NOTE: PST update: grp = 1, dsk = 27, mode = 0x4

NOTE: PST update: grp = 1, dsk = 28, mode = 0x4

NOTE: PST update: grp = 1, dsk = 29, mode = 0x4

Wed Feb 13 04:09:42 2013

ERROR: too many offline disks in PST (grp 1)

Wed Feb 13 04:09:42 2013

SUCCESS: refreshed membership for 1/0x53bffa1 (DG1)

ERROR: ORA-15040 thrown in RBAL for group number 1

Wed Feb 13 04:09:42 2013

Errors in file /opt/oracle/product/10.2.0/asm/admin/+ASM/bdump/+asm1_rbal_30556.trc:

ORA-15040: diskgroup is incomplete

ORA-15066: offlining disk "" may result in a data loss

ORA-15042: ASM disk "29" is missing

ORA-15042: ASM disk "28" is missing

ORA-15042: ASM disk "27" is missing

ORA-15042: ASM disk "26" is missing

Wed Feb 13 04:09:43 2013

ERROR: PST-initiated MANDATORY DISMOUNT of group DG1

Received dirty detach msg from node 3 for dom 1

Wed Feb 13 04:09:43 2013

Dirty detach reconfiguration started (old inc 12, new inc 12)

这时我们需要分析ASM的 DISK DIRECTORY和PST以及 DISK HEADER:

我们来看看:

kfddde[4].entry.incarn: 2 0x724: A=0 NUMM=0x1

kfddde[4].entry.hash: 0 0x728: 0x00000000

kfddde[4].entry.refer.number: 0 0x72c: 0x00000000

kfddde[4].entry.refer.incarn: 0 0x730: A=0 NUMM=0x0

kfddde[4].dsknum: 28 0x734: 0x001c

kfddde[4].state: 8 0x736: KFDSTA_ADDING <<<===============================

kfddde[4].ub1spare: 0 0x737: 0x00

kfddde[4].dskname: DG1_0028 0x738: length=8

kfddde[4].fgname: DG1_0028 0x758: length=8

kfddde[4].crestmp.hi: 32983460 0x778: HOUR=0x4 DAYS=0xd MNTH=0x2 YEAR=0x7dd

kfddde[4].crestmp.lo: 443710464 0x77c: USEC=0x0 MSEC=0x9f SECS=0x27 MINS=0x6

kfddde[4].failstmp.hi: 0 0x780: HOUR=0x0 DAYS=0x0 MNTH=0x0 YEAR=0x0

kfddde[4].failstmp.lo: 0 0x784: USEC=0x0 MSEC=0x0 SECS=0x0 MINS=0x0

kfddde[4].timer: 0 0x788: 0x00000000

kfddde[4].size: 307199 0x78c: 0x0004afff

kfddde[3].entry.incarn: 2 0x564: A=0 NUMM=0x1

kfddde[3].entry.hash: 0 0x568: 0x00000000

kfddde[3].entry.refer.number: 0 0x56c: 0x00000000

kfddde[3].entry.refer.incarn: 0 0x570: A=0 NUMM=0x0

kfddde[3].dsknum: 27 0x574: 0x001b

kfddde[3].state: 8 0x576: KFDSTA_ADDING <<<===============================

kfddde[3].ub1spare: 0 0x577: 0x00

kfddde[3].dskname: DG1_0027 0x578: length=8

kfddde[3].fgname: DG1_0027 0x598: length=8

kfddde[3].crestmp.hi: 32983460 0x5b8: HOUR=0x4 DAYS=0xd MNTH=0x2 YEAR=0x7dd

kfddde[3].crestmp.lo: 443710464 0x5bc: USEC=0x0 MSEC=0x9f SECS=0x27 MINS=0x6

kfddde[3].failstmp.hi: 0 0x5c0: HOUR=0x0 DAYS=0x0 MNTH=0x0 YEAR=0x0

kfddde[3].failstmp.lo: 0 0x5c4: USEC=0x0 MSEC=0x0 SECS=0x0 MINS=0x0

kfddde[4].entry.hash: 0 0x728: 0x00000000

kfddde[4].entry.refer.number: 0 0x72c: 0x00000000

kfddde[4].entry.refer.incarn: 0 0x730: A=0 NUMM=0x0

kfddde[4].dsknum: 28 0x734: 0x001c

kfddde[4].state: 8 0x736: KFDSTA_ADDING <<<===============================

kfddde[4].ub1spare: 0 0x737: 0x00

kfddde[4].dskname: DG1_0028 0x738: length=8

kfddde[4].fgname: DG1_0028 0x758: length=8

kfddde[4].crestmp.hi: 32983460 0x778: HOUR=0x4 DAYS=0xd MNTH=0x2 YEAR=0x7dd

kfddde[4].crestmp.lo: 443710464 0x77c: USEC=0x0 MSEC=0x9f SECS=0x27 MINS=0x6

kfddde[4].failstmp.hi: 0 0x780: HOUR=0x0 DAYS=0x0 MNTH=0x0 YEAR=0x0

kfddde[4].failstmp.lo: 0 0x784: USEC=0x0 MSEC=0x0 SECS=0x0 MINS=0x0

kfddde[4].timer: 0 0x788: 0x00000000

kfddde[4].size: 307199 0x78c: 0x0004afff

kfddde[5].entry.incarn: 2 0x8e4: A=0 NUMM=0x1

kfddde[5].entry.hash: 0 0x8e8: 0x00000000

kfddde[5].entry.refer.number: 0 0x8ec: 0x00000000

kfddde[5].entry.refer.incarn: 0 0x8f0: A=0 NUMM=0x0

kfddde[5].dsknum: 29 0x8f4: 0x001d

kfddde[5].state: 8 0x8f6: KFDSTA_ADDING <<<===============================

kfddde[5].ub1spare: 0 0x8f7: 0x00

kfddde[5].dskname: DG1_0029 0x8f8: length=8

kfddde[5].fgname: DG1_0029 0x918: length=8

kfddde[5].crestmp.hi: 32983460 0x938: HOUR=0x4 DAYS=0xd MNTH=0x2 YEAR=0x7dd

kfddde[5].crestmp.lo: 443710464 0x93c: USEC=0x0 MSEC=0x9f SECS=0x27 MINS=0x6

kfddde[5].failstmp.hi: 0 0x940: HOUR=0x0 DAYS=0x0 MNTH=0x0 YEAR=0x0

kfddde[5].failstmp.lo: 0 0x944: USEC=0x0 MSEC=0x0 SECS=0x0 MINS=0x0

kfddde[5].timer: 0 0x948: 0x00000000

File_name :: dg1_3.kfed

/dev/mapper/t1_asm05p1

kfbh.endian: 0 0x000: 0x00

kfbh.hard: 0 0x001: 0x00

kfbh.type: 0 0x002: KFBTYP_INVALID

kfbh.datfmt: 0 0x003: 0x00

kfbh.block.blk: 0 0x004: T=0 NUMB=0x0

kfbh.block.obj: 0 0x008: TYPE=0x0 NUMB=0x0

kfbh.check: 0 0x00c: 0x00000000

kfbh.fcn.base: 0 0x010: 0x00000000

kfbh.fcn.wrap: 0 0x014: 0x00000000

kfbh.spare1: 0 0x018: 0x00000000

kfbh.spare2: 0 0x01c: 0x00000000

/dev/mapper/t1_asm06p1

kfbh.endian: 0 0x000: 0x00

kfbh.hard: 0 0x001: 0x00

kfbh.type: 0 0x002: KFBTYP_INVALID

kfbh.datfmt: 0 0x003: 0x00

kfbh.block.blk: 0 0x004: T=0 NUMB=0x0

kfbh.block.obj: 0 0x008: TYPE=0x0 NUMB=0x0

kfbh.check: 0 0x00c: 0x00000000

kfbh.fcn.base: 0 0x010: 0x00000000

kfbh.fcn.wrap: 0 0x014: 0x00000000

kfbh.spare1: 0 0x018: 0x00000000

kfbh.spare2: 0 0x01c: 0x00000000

从上面的DISK DIRECTORY中的status可以看到KFDSTA_ADDING ,即新加入的磁盘仍在加入过程中,同时也没有完成rebalance。

查询PST的脚本如下:

vi kfed_pst.sh

-----

#! /bin/sh

rm /tmp/kfed_PST.out

for i in `ls *`

do

echo $i >>/tmp/kfed_PST.out

./kfed read $i aun=1 blkn=2 >>/tmp/kfed_PST.out

done

----

chmod u+x kfed_pst.sh

对于该问题需要手动Patch ASM metadata的方法来解决,否则无法让diskgroup重新mount起来。

如果自己搞不定可以找ASKMACLEAN专业ORACLE数据库修复团队成员帮您恢复!

(1)添加磁盘。

(2)fdisk格式化。

(3)加载已更新的块设备分区表(在rac的两台机器上执行)(机器可以发现磁盘)

(4)授权。(使oracle用户和响应的组拥有权限)

(5)给磁盘组添加磁盘。(添加到asm磁盘组)

具体举例:

su - grid

sqlplus / as sysasm

alter diskgroup data add disk '/dev/mapper/data15p1'(你格式化后的磁盘)

(6)检查

su - grid

asmcmd

lsdg

ASM丢失disk header导致ORA-15032、ORA-15040、ORA-15042 Diskgroup无法mount的案例不少,可以使用内部工具kfed修复的, 如果自己搞不定可以找ASKMACLEAN专业ORACLE数据库修复团队成员帮您恢复!


欢迎分享,转载请注明来源:内存溢出

原文地址: http://outofmemory.cn/bake/11831619.html

(0)
打赏 微信扫一扫 微信扫一扫 支付宝扫一扫 支付宝扫一扫
上一篇 2023-05-19
下一篇 2023-05-19

发表评论

登录后才能评论

评论列表(0条)

保存