数据库重启之后无法open的恢复案例

本站文章除注明转载外，均为本站原创： 转载自love wife love life —Roger的Oracle/MySQL/PostgreSQL数据恢复博客

这是某网友的维护的一套数据库，据说是正常重启之后就无法启动数据库了。那么我们先来看看日志是什么样的:

Errors in file /u01/app/oracle/admin/orcl/bdump/orcl1_p012_18165.trc:
ORA-27090: Message 27090 not found;  product=RDBMS; facility=ORA
Linux-x86_64 Error: 4: Interrupted system call
Additional information: 3
Additional information: 128
Additional information: 65536
.....
Errors in file /u01/app/oracle/admin/orcl/bdump/orcl1_p007_18153.trc:
ORA-27090: Message 27090 not found;  product=RDBMS; facility=ORA
Linux-x86_64 Error: 4: Interrupted system call
Additional information: 3
Additional information: 128
Additional information: 65536
Starting ORACLE instance (normal)
LICENSE_MAX_SESSION = 0
.....
SMON: enabling cache recovery
Errors in file /u01/app/oracle/admin/orcl/udump/orcl1_ora_8858.trc:
ORA-00600: internal error code, arguments: [16703], [1403], [20], [], [], [], [], []
.....
Errors in file /u01/app/oracle/admin/orcl/udump/orcl1_ora_8858.trc:
ORA-00704: bootstrap process failure
ORA-00704: bootstrap process failure
ORA-00600: internal error code, arguments: [16703], [1403], [20], [], [], [], [], []
.....
Error 704 happened during db open, shutting down database
USER: terminating instance due to error 704

Errors in file /u01/app/oracle/admin/orcl/bdump/orcl1_p012_18165.trc:

ORA-27090: Message 27090 not found; product=RDBMS; facility=ORA

Linux-x86_64 Error: 4: Interrupted system call

Additional information: 3

Additional information: 128

Additional information: 65536

.....

Errors in file /u01/app/oracle/admin/orcl/bdump/orcl1_p007_18153.trc:

ORA-27090: Message 27090 not found; product=RDBMS; facility=ORA

Linux-x86_64 Error: 4: Interrupted system call

Additional information: 3

Additional information: 128

Additional information: 65536

Starting ORACLE instance (normal)

LICENSE_MAX_SESSION = 0

.....

SMON: enabling cache recovery

Errors in file /u01/app/oracle/admin/orcl/udump/orcl1_ora_8858.trc:

ORA-00600: internal error code, arguments: [16703], [1403], [20], [], [], [], [], []

.....

Errors in file /u01/app/oracle/admin/orcl/udump/orcl1_ora_8858.trc:

ORA-00704: bootstrap process failure

ORA-00600: internal error code, arguments: [16703], [1403], [20], [], [], [], [], []

.....

Error 704 happened during db open, shutting down database

USER: terminating instance due to error 704

我们可以看到,节点1在9：48：52秒被强行终止重启了实例。而且我们还可以看出该节点从9:42开始就出现ORA-27090 错误。而该错误通常跟操作系统有关系，通过后面的Linux-x86_64 Error: 4: Interrupted system call 错误也验证了这一点。

Thread 2 advanced to log sequence 334685 (LGWR switch)
Current log# 4 seq# 334685 mem# 0: +DATA/orcl/onlinelog/group_4.log
.....
Starting ORACLE instance (normal)
LICENSE_MAX_SESSION = 0
.....
Errors in file /u01/app/oracle/admin/orcl/bdump/orcl2_mmon_9401.trc:
ORA-07445: Message 7445 not found; No message file for product=RDBMS, facility=ORA; arguments: [kgghteFindCB()+188] [SIGSEGV] [Address not mapped to object] [0x00000010B]
.....
Errors in file /u01/app/oracle/admin/orcl/udump/orcl2_ora_9475.trc:
ORA-07445: exception encountered: core dump [kglsget()+490] [SIGSEGV] [Address not mapped to object] [0x000000008] [] []
.....
Error 0 in kwqmnpartition(), aborting txn 
.....
Errors in file /u01/app/oracle/admin/orcl/udump/orcl2_ora_9943.trc:
ORA-00600: internal error code, arguments: [kggfaAllocFunc1], [], [], [], [], [], [], []
.....
ORA-00600: internal error code, arguments: [kggfaAllocFunc1], [], [], [], [], [], [], []
.....
ORA-65535 encountered when generating server alert SMG-4131
.....
Errors in file /u01/app/oracle/admin/orcl/udump/orcl2_ora_9943.trc:
ORA-00600: internal error code, arguments: [kggfaAllocFunc1], [], [], [], [], [], [], []
.....
Reconfiguration started (old inc 2, new inc 4)
List of nodes:
0 1
Global Resource Directory frozen
Communication channels reestablished
* domain 0 valid = 1 according to instance 0 
......
Completed redo application
.....
Errors in file /u01/app/oracle/admin/orcl/udump/orcl2_ora_10551.trc:
ORA-00600: internal error code, arguments: [kggfaAllocFunc1], [], [], [], [], [], [], []
.....
Errors in file /u01/app/oracle/admin/orcl/bdump/orcl2_smon_9395.trc:
ORA-00600: Message 600 not found; No message file for product=RDBMS, facility=ORA; arguments: [16659] [kqldtu] [D] [0] [65]
.....
Redo Shipping Client Connected as PUBLIC
-- Connected User is Valid
......
Non-fatal internal error happenned while SMON was doing cursor transient type cleanup.
SMON encountered 1 out of maximum 100 non-fatal internal errors.
.....
Trace dumping is performing id=[cdmp_20160802094934]
.....
Errors in file /u01/app/oracle/admin/orcl/bdump/orcl2_smon_9395.trc:
ORA-00600: Message 600 not found; No message file for product=RDBMS, facility=ORA; arguments: [16659] [kqldtu] [D] [0] [195]
.....
Non-fatal internal error happenned while SMON was doing cursor transient type cleanup.
SMON encountered 2 out of maximum 100 non-fatal internal errors.
......
Errors in file /u01/app/oracle/admin/orcl/bdump/orcl2_pmon_9349.trc:
ORA-00474: Message 474 not found; No message file for product=RDBMS, facility=ORA
......
PMON: terminating instance due to error 474
......
Errors in file /u01/app/oracle/admin/orcl/bdump/orcl2_lmon_9355.trc:
ORA-00474: Message 474 not found; No message file for product=RDBMS, facility=ORA
.....
System state dump is made for local instance
System State dumped to trace file /u01/app/oracle/admin/orcl/bdump/orcl2_diag_9351.trc
......
Shutting down instance (abort)
License high water mark = 72
......
Trace dumping is performing id=[cdmp_20160802095002]
.....
Instance terminated by PMON, pid = 9349
.....
Instance terminated by USER, pid = 11027
.....
Starting ORACLE instance (normal)

Thread 2 advanced to log sequence 334685 (LGWR switch)

Current log# 4 seq# 334685 mem# 0: +DATA/orcl/onlinelog/group_4.log

.....

Starting ORACLE instance (normal)

LICENSE_MAX_SESSION = 0

.....

Errors in file /u01/app/oracle/admin/orcl/bdump/orcl2_mmon_9401.trc:

ORA-07445: Message 7445 not found; No message file for product=RDBMS, facility=ORA; arguments: [kgghteFindCB()+188] [SIGSEGV] [Address not mapped to object] [0x00000010B]

.....

Errors in file /u01/app/oracle/admin/orcl/udump/orcl2_ora_9475.trc:

ORA-07445: exception encountered: core dump [kglsget()+490] [SIGSEGV] [Address not mapped to object] [0x000000008] [] []

.....

Error 0 in kwqmnpartition(), aborting txn

.....

Errors in file /u01/app/oracle/admin/orcl/udump/orcl2_ora_9943.trc:

ORA-00600: internal error code, arguments: [kggfaAllocFunc1], [], [], [], [], [], [], []

.....

ORA-00600: internal error code, arguments: [kggfaAllocFunc1], [], [], [], [], [], [], []

.....

ORA-65535 encountered when generating server alert SMG-4131

.....

Errors in file /u01/app/oracle/admin/orcl/udump/orcl2_ora_9943.trc:

ORA-00600: internal error code, arguments: [kggfaAllocFunc1], [], [], [], [], [], [], []

.....

Reconfiguration started (old inc 2, new inc 4)

List of nodes:

0 1

Global Resource Directory frozen

Communication channels reestablished

* domain 0 valid = 1 according to instance 0

......

Completed redo application

.....

Errors in file /u01/app/oracle/admin/orcl/udump/orcl2_ora_10551.trc:

ORA-00600: internal error code, arguments: [kggfaAllocFunc1], [], [], [], [], [], [], []

.....

Errors in file /u01/app/oracle/admin/orcl/bdump/orcl2_smon_9395.trc:

ORA-00600: Message 600 not found; No message file for product=RDBMS, facility=ORA; arguments: [16659] [kqldtu] [D] [0] [65]

.....

Redo Shipping Client Connected as PUBLIC

-- Connected User is Valid

......

Non-fatal internal error happenned while SMON was doing cursor transient type cleanup.

SMON encountered 1 out of maximum 100 non-fatal internal errors.

.....

Trace dumping is performing id=[cdmp_20160802094934]

.....

Errors in file /u01/app/oracle/admin/orcl/bdump/orcl2_smon_9395.trc:

ORA-00600: Message 600 not found; No message file for product=RDBMS, facility=ORA; arguments: [16659] [kqldtu] [D] [0] [195]

.....

Non-fatal internal error happenned while SMON was doing cursor transient type cleanup.

SMON encountered 2 out of maximum 100 non-fatal internal errors.

......

Errors in file /u01/app/oracle/admin/orcl/bdump/orcl2_pmon_9349.trc:

ORA-00474: Message 474 not found; No message file for product=RDBMS, facility=ORA

......

PMON: terminating instance due to error 474

......

Errors in file /u01/app/oracle/admin/orcl/bdump/orcl2_lmon_9355.trc:

ORA-00474: Message 474 not found; No message file for product=RDBMS, facility=ORA

.....

System state dump is made for local instance

System State dumped to trace file /u01/app/oracle/admin/orcl/bdump/orcl2_diag_9351.trc

......

Shutting down instance (abort)

License high water mark = 72

......

Trace dumping is performing id=[cdmp_20160802095002]

.....

Instance terminated by PMON, pid = 9349

.....

Instance terminated by USER, pid = 11027

.....

Starting ORACLE instance (normal)

这里我们无论是看节点1还是节点2的alert log日志都会发现，由于smon进程在进程事务恢复时失败之后，导致数据库实例最终宕掉。宕掉之后就再也无法正常启动了。很明显这是强行关库之后带来的蝴蝶效应。
这里我们来看看其中节点2的这个ORA-00600 [16559]是什么含义？

ERROR: ORA-600 [16659] [a] [b] [c] [d] [e] 
VERSIONS: versions 8.0 to 8.1 
DESCRIPTION: We are attempting to update a tab$ row and fail to update the dictionary information correctly. 
FUNCTIONALITY: DATA DICTIONARY IMPACT: PROCESS FAILURE POSSIBLE DATA DICTIONARY INCONSISTENCY

ERROR: ORA-600 [16659] [a] [b] [c] [d] [e]

VERSIONS: versions 8.0 to 8.1

DESCRIPTION: We are attempting to update a tab$ row and fail to update the dictionary information correctly.

FUNCTIONALITY: DATA DICTIONARY IMPACT: PROCESS FAILURE POSSIBLE DATA DICTIONARY INCONSISTENCY

从解释来看，这是Oracle 数据字典表tab$出现了不一致的情况。比较郁闷的是，客户的dataguard也坏掉了，也是一样的错误。
那么看来只能进行恢复了。这里首先要明白，节点1的ora-00600 [16703]本质上来讲跟ora-00600 [16559]是一回事。
从具体的错误来看，Oracle在open时，进行bootstrap初始化的过程就失败了，因此报错ORA-00704: bootstrap process failure.
处理思路也很简单，我们首先通过10046 trace跟踪open的过程，来看看Oracle 在bootstrap初始化的时候在进行什么操作时报错的？

PARSING IN CURSOR #2 len=106 dep=1 uid=0 oct=3 lid=0 tim=1435661277781458 hv=3628073639 ad='cfd74588'
select rowcnt,blkcnt,empcnt,avgspc,chncnt,avgrln,nvl(degree,1), nvl(instances,1) from tab$ where obj# = :1
END OF STMT
PARSE #2:c=999,e=676,p=0,cr=0,cu=0,mis=1,r=0,dep=1,og=4,tim=1435661277781453
=====================
.....
BINDS #2:
kkscoacd
Bind#0
oacdty=02 mxl=22(22) mxlc=00 mal=00 scl=00 pre=00
oacflg=08 fl2=0001 frm=00 csi=00 siz=24 off=0
kxsbbbfp=2b5f7bae1b00  bln=22  avl=02  flg=05
value=20
EXEC #2:c=999,e=1015,p=0,cr=0,cu=0,mis=1,r=0,dep=1,og=4,tim=1435661277798049
WAIT #2: nam='db file sequential read' ela= 1086 file#=1 block#=50 blocks=1 obj#=-1 tim=1435661277799253
WAIT #2: nam='db file sequential read' ela= 568 file#=1 block#=51 blocks=1 obj#=-1 tim=1435661277799931
WAIT #2: nam='db file sequential read' ela= 1804 file#=1 block#=26 blocks=1 obj#=-1 tim=1435661277801863
FETCH #2:c=0,e=3833,p=3,cr=3,cu=0,mis=0,r=0,dep=1,og=4,tim=1435661277801922
*** 2016-08-02 13:52:28.469
ksedmp: internal or fatal error
ORA-00600: internal error code, arguments: [16703], [1403], [20], [], [], [], [], []
Current SQL statement for this session:
alter database open

PARSING IN CURSOR #2 len=106 dep=1 uid=0 oct=3 lid=0 tim=1435661277781458 hv=3628073639 ad='cfd74588'

select rowcnt,blkcnt,empcnt,avgspc,chncnt,avgrln,nvl(degree,1), nvl(instances,1) from tab$ where obj# = :1

END OF STMT

PARSE #2:c=999,e=676,p=0,cr=0,cu=0,mis=1,r=0,dep=1,og=4,tim=1435661277781453

=====================

.....

BINDS #2:

kkscoacd

Bind#0

oacdty=02 mxl=22(22) mxlc=00 mal=00 scl=00 pre=00

oacflg=08 fl2=0001 frm=00 csi=00 siz=24 off=0

kxsbbbfp=2b5f7bae1b00 bln=22 avl=02 flg=05

value=20

EXEC #2:c=999,e=1015,p=0,cr=0,cu=0,mis=1,r=0,dep=1,og=4,tim=1435661277798049

WAIT #2: nam='db file sequential read' ela= 1086 file#=1 block#=50 blocks=1 obj#=-1 tim=1435661277799253

WAIT #2: nam='db file sequential read' ela= 568 file#=1 block#=51 blocks=1 obj#=-1 tim=1435661277799931

WAIT #2: nam='db file sequential read' ela= 1804 file#=1 block#=26 blocks=1 obj#=-1 tim=1435661277801863

FETCH #2:c=0,e=3833,p=3,cr=3,cu=0,mis=0,r=0,dep=1,og=4,tim=1435661277801922

*** 2016-08-02 13:52:28.469

ksedmp: internal or fatal error

ORA-00600: internal error code, arguments: [16703], [1403], [20], [], [], [], [], []

Current SQL statement for this session:

alter database open

从上面的错误不难看出就是在访问tab$ 的时候报错的，而且是访问的obj#=20的这个对象。那么这个对象是什么呢？

SQL&gt; select owner,object_name,object_type from dba_objects where object_id=20;
OWNER                          OBJECT_NAME                    OBJECT_TYPE
------------------------------ ------------------------------ -------------------
SYS                            ICOL$                          TABLE
SQL&gt; !oerr ora 1403
01403, 00000, "no data found"
// *Cause:
// *Action:

SQL> select owner,object_name,object_type from dba_objects where object_id=20;

OWNER OBJECT_NAME OBJECT_TYPE

------------------------------ ------------------------------ -------------------

SYS ICOL$ TABLE

SQL> !oerr ora 1403

01403, 00000, "no data found"

// *Cause:

// *Action:

根据我们的查询以及对ORA-00600 [16703],[1403],[20] 这个错误的理解，那么我这里可以大致判断这个错误后的几个数字的含义：
16703: 错误代码,表示数据字典基表存在不一致1403: 表示数据没找到或者不匹配,即not data found.20: 表示访问的对象号,即object_id.
同时我们从前面的10046 trace跟踪来看,报错的SQL语句访问了3个block,然后报错,分别是file 1 block 50,51,26。
这我们分别dump 上面的3个block发现其中block 51，26 的dump 内容如下：

---block 51
Object id on Block? Y
seg/obj: 0x3  csc: 0x00.64205  itc: 2  flg: -  typ: 2 - INDEX
fsl: 0  fnx: 0x0 ver: 0x01
Itl           Xid                  Uba         Flag  Lck        Scn/Fsc
0x01   0x0001.008.0000000d  0x0080118a.0010.01  CB--    0  scn 0x0000.000105c7
0x02   0x000a.01e.000005e0  0x008074fc.007a.36  --U-    1  fsc 0x0000.00064208
Leaf block dump
===============
header address 185777756=0xb12be5c
kdxcolev 0
KDXCOLEV Flags = - - -
kdxcolok 0
kdxcoopc 0x80: opcode=0: iot flags=--- is converted=Y
kdxconco 1
kdxcosdc 2
kdxconro 258
kdxcofbo 552=0x228
kdxcofeo 4484=0x1184
kdxcoavs 3932
kdxlespl 0
kdxlende 0
kdxlenxt 4194360=0x400038
kdxleprv 0=0x0
kdxledsz 8
kdxlebksz 8032
...
row#93[5724] flag: ------, lock: 2, len=14, data:(8):  00 40 38 8e 00 06 02 00
col 0; len 3; (3):  c2 03 03
......
---block 26
Block header dump:  0x0040001a
Object id on Block? Y
seg/obj: 0x2  csc: 0x05.e0568950  itc: 2  flg: -  typ: 1 - DATA
fsl: 0  fnx: 0x0 ver: 0x01
Itl           Xid                  Uba         Flag  Lck        Scn/Fsc
0x01   0x0000.05f.00000002  0x00400012.0001.16  C---    0  scn 0x0000.000000fd
0x02   0x000b.00d.00048164  0x0100982b.b8bd.4d  --U-    7  fsc 0x039f.e0568c3e
data_block_dump,data header at 0xb12be5c
===============
tsiz: 0x1fa0
hsiz: 0x140
pbl: 0x0b12be5c
bdba: 0x0040001a
76543210
flag=--------
ntab=6
nrow=141
frre=-1
fsbo=0x140
fseo=0x160
avsp=0x20
tosp=0x3db
0xe:pti[0]      nrow=8  offs=0
0x12:pti[1]     nrow=7  offs=8
0x16:pti[2]     nrow=1  offs=15
0x1a:pti[3]     nrow=10 offs=16
0x1e:pti[4]     nrow=15 offs=26
0x22:pti[5]     nrow=100        offs=41
.....
col  0: [ 2]  c1 14
tab 1, row 0, @0x1e65
tl: 4 fb: -CHDFL-- lb: 0x2  cc: 0 cki: 0
tab 1, row 1, @0x1ddb
tl: 4 fb: -CHDFL-- lb: 0x2  cc: 0 cki: 1
tab 1, row 2, @0x1d51
tl: 4 fb: -CHDFL-- lb: 0x2  cc: 0 cki: 2
tab 1, row 3, @0x1ccd
tl: 4 fb: -CHDFL-- lb: 0x2  cc: 0 cki: 4
tab 1, row 4, @0x1c45
tl: 4 fb: -CHDFL-- lb: 0x2  cc: 0 cki: 5
tab 1, row 5, @0x1bc0
tl: 4 fb: -CHDFL-- lb: 0x2  cc: 0 cki: 6
tab 1, row 6, @0x1b35
tl: 4 fb: -CHDFL-- lb: 0x2  cc: 0 cki: 7
tab 2, row 0, @0x160
tl: 48 fb: -CH-FL-- lb: 0x0  cc: 19 cki: 3
col  0: [ 2]  c1 1e
col  1: [ 1]  80
.....
col 17: *NULL*
col 18: [ 1]  80
tab 3, row 0, @0x1ad0
.....

---block 51

Object id on Block? Y

seg/obj: 0x3 csc: 0x00.64205 itc: 2 flg: - typ: 2 - INDEX

fsl: 0 fnx: 0x0 ver: 0x01

Itl Xid Uba Flag Lck Scn/Fsc

0x01 0x0001.008.0000000d 0x0080118a.0010.01 CB-- 0 scn 0x0000.000105c7

0x02 0x000a.01e.000005e0 0x008074fc.007a.36 --U- 1 fsc 0x0000.00064208

Leaf block dump

===============

header address 185777756=0xb12be5c

kdxcolev 0

KDXCOLEV Flags = - - -

kdxcolok 0

kdxcoopc 0x80: opcode=0: iot flags=--- is converted=Y

kdxconco 1

kdxcosdc 2

kdxconro 258

kdxcofbo 552=0x228

kdxcofeo 4484=0x1184

kdxcoavs 3932

kdxlespl 0

kdxlende 0

kdxlenxt 4194360=0x400038

kdxleprv 0=0x0

kdxledsz 8

kdxlebksz 8032

...

row#93[5724] flag: ------, lock: 2, len=14, data:(8): 00 40 38 8e 00 06 02 00

col 0; len 3; (3): c2 03 03

......

---block 26

Block header dump: 0x0040001a

Object id on Block? Y

seg/obj: 0x2 csc: 0x05.e0568950 itc: 2 flg: - typ: 1 - DATA

fsl: 0 fnx: 0x0 ver: 0x01

Itl Xid Uba Flag Lck Scn/Fsc

0x01 0x0000.05f.00000002 0x00400012.0001.16 C--- 0 scn 0x0000.000000fd

0x02 0x000b.00d.00048164 0x0100982b.b8bd.4d --U- 7 fsc 0x039f.e0568c3e

data_block_dump,data header at 0xb12be5c

===============

tsiz: 0x1fa0

hsiz: 0x140

pbl: 0x0b12be5c

bdba: 0x0040001a

76543210

flag=--------

ntab=6

nrow=141

frre=-1

fsbo=0x140

fseo=0x160

avsp=0x20

tosp=0x3db

0xe:pti[0] nrow=8 offs=0

0x12:pti[1] nrow=7 offs=8

0x16:pti[2] nrow=1 offs=15

0x1a:pti[3] nrow=10 offs=16

0x1e:pti[4] nrow=15 offs=26

0x22:pti[5] nrow=100 offs=41

.....

col 0: [ 2] c1 14

tab 1, row 0, @0x1e65

tl: 4 fb: -CHDFL-- lb: 0x2 cc: 0 cki: 0

tab 1, row 1, @0x1ddb

tl: 4 fb: -CHDFL-- lb: 0x2 cc: 0 cki: 1

tab 1, row 2, @0x1d51

tl: 4 fb: -CHDFL-- lb: 0x2 cc: 0 cki: 2

tab 1, row 3, @0x1ccd

tl: 4 fb: -CHDFL-- lb: 0x2 cc: 0 cki: 4

tab 1, row 4, @0x1c45

tl: 4 fb: -CHDFL-- lb: 0x2 cc: 0 cki: 5

tab 1, row 5, @0x1bc0

tl: 4 fb: -CHDFL-- lb: 0x2 cc: 0 cki: 6

tab 1, row 6, @0x1b35

tl: 4 fb: -CHDFL-- lb: 0x2 cc: 0 cki: 7

tab 2, row 0, @0x160

tl: 48 fb: -CH-FL-- lb: 0x0 cc: 19 cki: 3

col 0: [ 2] c1 1e

col 1: [ 1] 80

.....

col 17: *NULL*

col 18: [ 1] 80

tab 3, row 0, @0x1ad0

.....

看到这里，我就想是否可以通过bbed先把这2个block 给修复了，看看是否能够起来。如下是简单的修复过程：
对于51号block 由于是Index 修改非常简单，这里不多说。26号block 是cluster table，这个相对复杂的多。首先提交事务、修改lock flag之后verify还是报错，如下：

BBED&gt; verify
DBVERIFY - Verification starting
FILE = /u01/fix/SYSTEM_OLD.dbf
BLOCK = 26
Block Checking: DBA = 4194330, Block Type = KTB-managed data block
data header at 0x105d485c
kdbchk:  key comref count wrong
keyslot=7
Block 26 failed with check code 6121
DBVERIFY - Verification complete
Total Blocks Examined         : 1
Total Blocks Processed (Data) : 1
Total Blocks Failing   (Data) : 1
Total Blocks Processed (Index): 0
Total Blocks Failing   (Index): 0
Total Blocks Empty            : 0
Total Blocks Marked Corrupt   : 0
Total Blocks Influx           : 0

BBED> verify

DBVERIFY - Verification starting

FILE = /u01/fix/SYSTEM_OLD.dbf

BLOCK = 26

Block Checking: DBA = 4194330, Block Type = KTB-managed data block

data header at 0x105d485c

kdbchk: key comref count wrong

keyslot=7

Block 26 failed with check code 6121

DBVERIFY - Verification complete

Total Blocks Examined : 1

Total Blocks Processed (Data) : 1

Total Blocks Failing (Data) : 1

Total Blocks Processed (Index): 0

Total Blocks Failing (Index): 0

Total Blocks Empty : 0

Total Blocks Marked Corrupt : 0

Total Blocks Influx : 0

这里继续修改聚簇对应的kdbr信息(这里以其中一个kdbr为例):

BBED&gt; p *kdbr[7]
rowdata[7568]
-------------
ub1 rowdata[7568]                           @8012     0xac
BBED&gt; x /rcccccc
rowdata[7568]                               @8012    
-------------
flag@8012: 0xac (KDRHFL, KDRHFF, KDRHFH, KDRHFK)
lock@8013: 0x00
cols@8014:    1
kref@8015:   31
mref@8017:   30
hrid@8019:0x0040001b.0
nrid@8025:0x0040001b.0
BBED&gt; modify /x 1f offset 8017
File: /u01/fix/SYSTEM_OLD.dbf (1)
Block: 26               Offsets: 8017 to 8020           Dba:0x0040001a
------------------------------------------------------------------------
1f000040 
&lt;32 bytes per line
BBED&gt; verify
DBVERIFY - Verification starting
FILE = /u01/fix/SYSTEM_OLD.dbf
BLOCK = 26
Block Checking: DBA = 4194330, Block Type = KTB-managed data block
data header at 0x105d485c
kdbchk: space available on commit is incorrect
tosp=987 fsc=0 stb=0 avsp=32
Block 26 failed with check code 6111
DBVERIFY - Verification complete
Total Blocks Examined         : 1
Total Blocks Processed (Data) : 1
Total Blocks Failing   (Data) : 1
Total Blocks Processed (Index): 0
Total Blocks Failing   (Index): 0
Total Blocks Empty            : 0
Total Blocks Marked Corrupt   : 0
Total Blocks Influx           : 0
BBED&gt; p kdbh
struct kdbh, 14 bytes                       @92      
ub1 kdbhflag                             @92       0x00 (NONE)
b1 kdbhntab                              @93       6
b2 kdbhnrow                              @94       141
sb2 kdbhfrre                             @96      -1
sb2 kdbhfsbo                             @98       320
sb2 kdbhfseo                             @100      352
b2 kdbhavsp                              @102      32
b2 kdbhtosp                              @104      987
BBED&gt; d /v offset 104 count 2
File: /u01/fix/SYSTEM_OLD.dbf (1)
Block: 26      Offsets:  104 to  105  Dba:0x0040001a
-------------------------------------------------------
db03                                l ..
&lt;16 bytes per line&gt;
BBED&gt; modify /x 2000 offset 104
File: /u01/fix/SYSTEM_OLD.dbf (1)
Block: 26               Offsets:  104 to  105           Dba:0x0040001a
------------------------------------------------------------------------
2000 
&lt;32 bytes per line&gt;
BBED&gt; sum apply
Check value for File 1, Block 26:
current = 0x87ce, required = 0x87ce
BBED&gt; verify
DBVERIFY - Verification starting
FILE = /u01/fix/SYSTEM_OLD.dbf
BLOCK = 26
DBVERIFY - Verification complete
Total Blocks Examined         : 1
Total Blocks Processed (Data) : 1
Total Blocks Failing   (Data) : 0
Total Blocks Processed (Index): 0
Total Blocks Failing   (Index): 0
Total Blocks Empty            : 0
Total Blocks Marked Corrupt   : 0
Total Blocks Influx           : 0

BBED> p *kdbr[7]

rowdata[7568]

-------------

ub1 rowdata[7568] @8012 0xac

BBED> x /rcccccc

rowdata[7568] @8012

-------------

flag@8012: 0xac (KDRHFL, KDRHFF, KDRHFH, KDRHFK)

lock@8013: 0x00

cols@8014: 1

kref@8015: 31

mref@8017: 30

hrid@8019:0x0040001b.0

nrid@8025:0x0040001b.0

BBED> modify /x 1f offset 8017

File: /u01/fix/SYSTEM_OLD.dbf (1)

Block: 26 Offsets: 8017 to 8020 Dba:0x0040001a

------------------------------------------------------------------------

1f000040

<32 bytes per line

BBED> verify

DBVERIFY - Verification starting

FILE = /u01/fix/SYSTEM_OLD.dbf

BLOCK = 26

Block Checking: DBA = 4194330, Block Type = KTB-managed data block

data header at 0x105d485c

kdbchk: space available on commit is incorrect

tosp=987 fsc=0 stb=0 avsp=32

Block 26 failed with check code 6111

DBVERIFY - Verification complete

Total Blocks Examined : 1

Total Blocks Processed (Data) : 1

Total Blocks Failing (Data) : 1

Total Blocks Processed (Index): 0

Total Blocks Failing (Index): 0

Total Blocks Empty : 0

Total Blocks Marked Corrupt : 0

Total Blocks Influx : 0

BBED> p kdbh

struct kdbh, 14 bytes @92

ub1 kdbhflag @92 0x00 (NONE)

b1 kdbhntab @93 6

b2 kdbhnrow @94 141

sb2 kdbhfrre @96 -1

sb2 kdbhfsbo @98 320

sb2 kdbhfseo @100 352

b2 kdbhavsp @102 32

b2 kdbhtosp @104 987

BBED> d /v offset 104 count 2

File: /u01/fix/SYSTEM_OLD.dbf (1)

Block: 26 Offsets: 104 to 105 Dba:0x0040001a

-------------------------------------------------------

db03 l ..

<16 bytes per line>

BBED> modify /x 2000 offset 104

File: /u01/fix/SYSTEM_OLD.dbf (1)

Block: 26 Offsets: 104 to 105 Dba:0x0040001a

------------------------------------------------------------------------

2000

<32 bytes per line>

BBED> sum apply

Check value for File 1, Block 26:

current = 0x87ce, required = 0x87ce

BBED> verify

DBVERIFY - Verification starting

FILE = /u01/fix/SYSTEM_OLD.dbf

BLOCK = 26

DBVERIFY - Verification complete

Total Blocks Examined : 1

Total Blocks Processed (Data) : 1

Total Blocks Failing (Data) : 0

Total Blocks Processed (Index): 0

Total Blocks Failing (Index): 0

Total Blocks Empty : 0

Total Blocks Marked Corrupt : 0

Total Blocks Influx : 0

我们经过几处简单修改之后，再次verify校验已经不再报错了；不过再次open数据库时，发现报另外一个错误了：

Errors in file /u01/app/oracle/admin/orcl/udump/orcl1_ora_18955.trc:
ORA-00704: bootstrap process failure
ORA-00704: bootstrap process failure
ORA-00600: internal error code, arguments: [kdoirp-3], [139], [0], [], [], [], [], []

Errors in file /u01/app/oracle/admin/orcl/udump/orcl1_ora_18955.trc:

ORA-00704: bootstrap process failure

ORA-00600: internal error code, arguments: [kdoirp-3], [139], [0], [], [], [], [], []

从错误来看，bootstrap的初始化过程仍然有问题。通过10046 trace跟踪发现还是那几个block。
回想前面这个block的dump时，看到的几行操作是delete，如下：tl: 4 fb: -CHDFL– lb: 0x2 cc: 0 cki: 0
那么我们这里试做将这几个被删除的操作进行还原是否ok 呢？也就是用bbed来恢复这7个delete操作。
由于是cluster table 的block，操作相对麻烦一些。不过我尝试修改之后，最后发现错误仍然一样。
其中[kdoirp-3]是什么含义呢? 我们来看下Oracle 文档的描述：

Layer 11:  KCOCODRW -  Row
opcode 1 :  KDOIUR  - interpret undo redo
opcode 2 :  KDOIRP  - insert row  piece
opcode 3 :  KDODRP  - drop row piece
opcode 4 :  KDOLKR  - lock row  piece
opcode 5 :  KDOURP  - update row piece
opcode 6 :  KDOORP  - overwrite row piece
opcode 7 :  KDOMFC  - manipulate first column
opcode 8 :  KDOCFA  - change forwarding address
opcode 9 :  KDOCKI  - change cluster key index
opcode 10 :  KDOSKL  - set key links
opcode 11 :  KDOQMI  - quick multi-insert (ex. insert as select...)
opcode 12 :  KDOQMD  - quick multi-delete
opcode 13 :  KDOTBF  - toggle block header flags

Layer 11: KCOCODRW - Row

opcode 1 : KDOIUR - interpret undo redo

opcode 2 : KDOIRP - insert row piece

opcode 3 : KDODRP - drop row piece

opcode 4 : KDOLKR - lock row piece

opcode 5 : KDOURP - update row piece

opcode 6 : KDOORP - overwrite row piece

opcode 7 : KDOMFC - manipulate first column

opcode 8 : KDOCFA - change forwarding address

opcode 9 : KDOCKI - change cluster key index

opcode 10 : KDOSKL - set key links

opcode 11 : KDOQMI - quick multi-insert (ex. insert as select...)

opcode 12 : KDOQMD - quick multi-delete

opcode 13 : KDOTBF - toggle block header flags

很明显，这表示insert row piece。看来我们单纯的修改这2个block 并不能绕过这个问题。实际上后面我dump分析发现又涉及到_next_object，又将问题复杂化了。
虽然我相信多折腾几次可以解决这个问题。但是操作确实麻烦，费劲。不过此时通过之前的备份restore出来的system文件已经ok了。这里我用bbed 将涉及到的几个block 进行替换，最后再修改resetlogs信息，重建控制文件之后，进行recover。
非常顺利的打开了数据库。最后检查alert log 还涉及到smon 回滚某个事务失败。那么如何完美处理呢？
首先dump undo header，然后获取该事务涉及的操作对象，然后使用如下参数屏蔽回滚段后，将undo表空间重建即可。
如下是dump undo header获取的信息：
针对这部分对象，由于破坏了事务的完整性，那么建议对表进行分析，其中Index进行重建。

Tags: bbed, ora-00600 16559, ora-00600 [16703], ora-00704 bootstrap failure
Posted in backup & rcovery, Oracle rdbms on 8月 4, 2016

You must be logged in to post a comment.

love wife love life —Roger的Oracle/MySQL/PostgreSQL数据恢复博客

Categories

Archives

最新评论

国内圈子

oracle security

数据库重启之后无法open的恢复案例

Leave a Reply