数套ASM RAC的恢复案例

本站文章除注明转载外，均为本站原创： 转载自love wife love life —Roger的Oracle/MySQL/PostgreSQL数据恢复博客

前不久帮助某客户恢复了6套Oracle RAC，均为ASM，而且版本均为10.2.0.4。熬夜好几天，差点吐血了。
这里以其中一套库的恢复进行简单说明，跟大家分享。
其中几套基本上都遇到了如下的ORA-00600 错误：

Thu Dec 31 11:55:46 2015
SUCCESS: diskgroup DG1 was mounted
Thu Dec 31 11:55:50 2015
Errors in file /oracle/admin/xxx/udump/xxx1_ora_28803.trc:
ORA-00600: internal error code, arguments: [kccpb_sanity_check_2], [13715626], [13715623], [0x000000000], [], [], [], []
SUCCESS: diskgroup DG1 was dismounted
Thu Dec 31 11:55:51 2015

Thu Dec 31 11:55:46 2015

SUCCESS: diskgroup DG1 was mounted

Thu Dec 31 11:55:50 2015

Errors in file /oracle/admin/xxx/udump/xxx1_ora_28803.trc:

ORA-00600: internal error code, arguments: [kccpb_sanity_check_2], [13715626], [13715623], [0x000000000], [], [], [], []

SUCCESS: diskgroup DG1 was dismounted

Thu Dec 31 11:55:51 2015

对于该错误，其实很简单，主要是因为控制文件损坏，通过重建控制文件或者利用备份的控制文件进行restore即可进行mount；甚至于我们利用控制文件快照都可以进行数据库mount；然后接着进行恢复操作。在恢复的过程中还遇到了如下的错误：

Errors in file /oracle/admin/xxx/udump/xxx1_ora_6990.trc:
ORA-00600: internal error code, arguments: [kclchkblk_4], [3431], [18446744072948603858], [3431], [18446744072948586897], [], [], []
Tue Jan  5 10:52:28 2016
Errors in file /oracle/admin/xxx/bdump/xxx1_arc0_8205.trc:
ORA-19504: failed to create file "+DG1/xxx/archivelog/1_2_900069464.dbf"
ORA-17502: ksfdcre:4 Failed to create file +DG1/xxx/archivelog/1_2_900069464.dbf
ORA-00600: internal error code, arguments: [kffbAddBlk04], [], [], [], [], [], [], []
Tue Jan  5 10:52:28 2016
ARC0: Error 19504 Creating archive log file to '+DG1/xxx/archivelog/1_2_900069464.dbf'
ARCH: Archival stopped, error occurred. Will continue retrying
Tue Jan  5 10:52:30 2016
ORACLE Instance xxx1 - Arc

Errors in file /oracle/admin/xxx/udump/xxx1_ora_6990.trc:

ORA-00600: internal error code, arguments: [kclchkblk_4], [3431], [18446744072948603858], [3431], [18446744072948586897], [], [], []

Tue Jan 5 10:52:28 2016

Errors in file /oracle/admin/xxx/bdump/xxx1_arc0_8205.trc:

ORA-19504: failed to create file "+DG1/xxx/archivelog/1_2_900069464.dbf"

ORA-17502: ksfdcre:4 Failed to create file +DG1/xxx/archivelog/1_2_900069464.dbf

ORA-00600: internal error code, arguments: [kffbAddBlk04], [], [], [], [], [], [], []

Tue Jan 5 10:52:28 2016

ARC0: Error 19504 Creating archive log file to '+DG1/xxx/archivelog/1_2_900069464.dbf'

ARCH: Archival stopped, error occurred. Will continue retrying

Tue Jan 5 10:52:30 2016

ORACLE Instance xxx1 - Arc

上述的ORA-00600错误其实很简单，主要是数据块SCN的问题。这里以其中一套库的恢复进行大致说明，因为在恢复该库的过程中，遇到了一件十分神奇的事情。

SQL&gt; startup mount pfile='/tmp/p.ora';
ORACLE instance started.
Total System Global Area 2.1475E+10 bytes
Fixed Size                  2122368 bytes
Variable Size            2399145344 bytes
Database Buffers         1.9059E+10 bytes
Redo Buffers               14651392 bytes
Database mounted.
SQL&gt; ALTER DATABASE ADD LOGFILE THREAD 2
2    GROUP 3 (
3      '+DG/xxxx/onlinelog/group_3.271.752099989',
4      '+DG/xxxx/onlinelog/group_3.272.752099991'
5    ) SIZE 100M REUSE,
6    GROUP 4 (
7      '+DG/xxxx/onlinelog/group_4.273.752099991',
8      '+DG/xxxx/onlinelog/group_4.274.752099993'
9    ) SIZE 100M REUSE,
10    GROUP 6 (
11      '+DG/xxxx/onlinelog/group_6.275.752099993',
12      '+DG/xxxx/onlinelog/group_6.276.752099993'
13    ) SIZE 100M REUSE;
ALTER DATABASE ADD LOGFILE THREAD 2
*
ERROR at line 1:
ORA-01276: Cannot add file +DG/xxxx/onlinelog/group_3.271.752099989.  File has
an Oracle Managed Files file name.

SQL> startup mount pfile='/tmp/p.ora';

ORACLE instance started.

Total System Global Area 2.1475E+10 bytes

Fixed Size 2122368 bytes

Variable Size 2399145344 bytes

Database Buffers 1.9059E+10 bytes

Redo Buffers 14651392 bytes

Database mounted.

SQL> ALTER DATABASE ADD LOGFILE THREAD 2

2 GROUP 3 (

3 '+DG/xxxx/onlinelog/group_3.271.752099989',

4 '+DG/xxxx/onlinelog/group_3.272.752099991'

5 ) SIZE 100M REUSE,

6 GROUP 4 (

7 '+DG/xxxx/onlinelog/group_4.273.752099991',

8 '+DG/xxxx/onlinelog/group_4.274.752099993'

9 ) SIZE 100M REUSE,

10 GROUP 6 (

11 '+DG/xxxx/onlinelog/group_6.275.752099993',

12 '+DG/xxxx/onlinelog/group_6.276.752099993'

13 ) SIZE 100M REUSE;

ALTER DATABASE ADD LOGFILE THREAD 2

ERROR at line 1:

ORA-01276: Cannot add file +DG/xxxx/onlinelog/group_3.271.752099989. File has

an Oracle Managed Files file name.

由于是ORACLE RAC，因此重建控制文件之后，是需要添加redo logfile的;然而add logfile 发现报上述错误。根据Oracle metalink的一些方法均不能成功，都报上面的错误，确实很怪异。
有些人看上述的错误，可能会认为是设置了OMF的参数，其实这里并不是，我将相关参数全部修改之后，错误依旧。
这里实际上添加logfile时，只写磁盘组名称就行了，不需要写绝对路径。
接着在进行recover后进行open resetlogs打开时，报错ORA-01248，如下：

SQL&gt; startup mount pfile='/tmp/p.ora';
ORACLE instance started.
Total System Global Area 2.1475E+10 bytes
Fixed Size                  2122368 bytes
Variable Size            2399145344 bytes
Database Buffers         1.9059E+10 bytes
Redo Buffers               14651392 bytes
Database mounted.
SQL&gt; recover database using backup controlfile until cancel;
ORA-00279: change 13300428179625 generated at 04/04/2013 12:51:35 needed for
thread 1
ORA-00289: suggestion : +DG/archivelog/arch1_752099890_12809_1.log
ORA-00280: change 13300428179625 for thread 1 is in sequence #12809
Specify log: {&lt;RET&gt;=suggested | filename | AUTO | CANCEL}
auto
ORA-00308: cannot open archived log
'+DG/archivelog/arch1_752099890_12809_1.log'
ORA-17503: ksfdopn:2 Failed to open file
+DG/archivelog/arch1_752099890_12809_1.log
ORA-15173: entry 'arch1_752099890_12809_1.log' does not exist in directory
'archivelog'
ORA-00308: cannot open archived log
'+DG/archivelog/arch1_752099890_12809_1.log'
ORA-17503: ksfdopn:2 Failed to open file
+DG/archivelog/arch1_752099890_12809_1.log
ORA-15173: entry 'arch1_752099890_12809_1.log' does not exist in directory
'archivelog'
ORA-01547: warning: RECOVER succeeded but OPEN RESETLOGS would get error below
ORA-01194: file 1 needs more recovery to be consistent
ORA-01110: data file 1: '+DG/xxxx/datafile/system.256.752099833'
SQL&gt; alter database open resetlogs;
alter database open resetlogs
*
ERROR at line 1:
ORA-01248: file 42 was created in the future of incomplete recovery
ORA-01110: data file 42: '+DG/xxxx/datafile/file_tab_xdidx03.ora'

SQL> startup mount pfile='/tmp/p.ora';

ORACLE instance started.

Total System Global Area 2.1475E+10 bytes

Fixed Size 2122368 bytes

Variable Size 2399145344 bytes

Database Buffers 1.9059E+10 bytes

Redo Buffers 14651392 bytes

Database mounted.

SQL> recover database using backup controlfile until cancel;

ORA-00279: change 13300428179625 generated at 04/04/2013 12:51:35 needed for

thread 1

ORA-00289: suggestion : +DG/archivelog/arch1_752099890_12809_1.log

ORA-00280: change 13300428179625 for thread 1 is in sequence #12809

Specify log: {<RET>=suggested | filename | AUTO | CANCEL}

auto

ORA-00308: cannot open archived log

'+DG/archivelog/arch1_752099890_12809_1.log'

ORA-17503: ksfdopn:2 Failed to open file

+DG/archivelog/arch1_752099890_12809_1.log

ORA-15173: entry 'arch1_752099890_12809_1.log' does not exist in directory

'archivelog'

ORA-00308: cannot open archived log

'+DG/archivelog/arch1_752099890_12809_1.log'

ORA-17503: ksfdopn:2 Failed to open file

+DG/archivelog/arch1_752099890_12809_1.log

ORA-15173: entry 'arch1_752099890_12809_1.log' does not exist in directory

'archivelog'

ORA-01547: warning: RECOVER succeeded but OPEN RESETLOGS would get error below

ORA-01194: file 1 needs more recovery to be consistent

ORA-01110: data file 1: '+DG/xxxx/datafile/system.256.752099833'

SQL> alter database open resetlogs;

alter database open resetlogs

ERROR at line 1:

ORA-01248: file 42 was created in the future of incomplete recovery

ORA-01110: data file 42: '+DG/xxxx/datafile/file_tab_xdidx03.ora'

这个错误还是比较少见，实际上网上那些说法，以及Oracle mos提供的解决方法我发现都不行。
无奈只能先将其offline ，然后再进行恢复。再进行open之前我查询了当前的checkpoint scn如下：

SQL&gt; select file#,checkpoint_change# from v$datafile;
FILE#      CHECKPOINT_CHANGE#
---------- -----------------------
1          14731601024328
2          14731601024328
3          14731601024328
4          13300428179625
5          14731601024328
6          14731601024328
7          14731601024328
.......
39          14731601024328
40          14731601024328
41          14731601024328
42          14731601024328
43          14731601024328
43 rows selected.
SQL&gt; c/datafile/datafile_header
1* select file#,checkpoint_change# from v$datafile_header
SQL&gt; /
FILE#      CHECKPOINT_CHANGE#
---------- -----------------------
1          14731601024328
2          14731601024328
3          14731601024328
4          13300428179625
5          14731601024328
6          14731601024328
7          14731601024328
......
40          14731601024328
41          14731601024328
42          14731601024328
43          14731601024328
43 rows selected.

SQL> select file#,checkpoint_change# from v$datafile;

FILE# CHECKPOINT_CHANGE#

---------- -----------------------

1 14731601024328

2 14731601024328

3 14731601024328

4 13300428179625

5 14731601024328

6 14731601024328

7 14731601024328

.......

39 14731601024328

40 14731601024328

41 14731601024328

42 14731601024328

43 14731601024328

43 rows selected.

SQL> c/datafile/datafile_header

1* select file#,checkpoint_change# from v$datafile_header

SQL> /

FILE# CHECKPOINT_CHANGE#

---------- -----------------------

1 14731601024328

2 14731601024328

3 14731601024328

4 13300428179625

5 14731601024328

6 14731601024328

7 14731601024328

......

40 14731601024328

41 14731601024328

42 14731601024328

43 14731601024328

43 rows selected.

由于open失败，这里我想着是不是这2个文件有问题，又用之前的快照控制文件进行recover一把，然后再次用重建的控制文件起来数据库进行recover，发现神奇的事情出现了：

SQL&gt; recover database using backup controlfile until cancel;
ORA-00279: change 13305808683011 generated at 01/11/2016 21:09:02 needed for
thread 1
ORA-00289: suggestion : +DG/archivelog/arch1_900882531_1_1.log
ORA-00280: change 13305808683011 for thread 1 is in sequence #1
Specify log: {&lt;RET&gt;=suggested | filename | AUTO | CANCEL}
auto
ORA-00308: cannot open archived log '+DG/archivelog/arch1_900882531_1_1.log'
ORA-17503: ksfdopn:2 Failed to open file +DG/archivelog/arch1_900882531_1_1.log
ORA-15173: entry 'arch1_900882531_1_1.log' does not exist in directory
'archivelog'
ORA-00308: cannot open archived log '+DG/archivelog/arch1_900882531_1_1.log'
ORA-17503: ksfdopn:2 Failed to open file +DG/archivelog/arch1_900882531_1_1.log
ORA-15173: entry 'arch1_900882531_1_1.log' does not exist in directory
'archivelog'
ORA-01547: warning: RECOVER succeeded but OPEN RESETLOGS would get error below
ORA-01194: file 1 needs more recovery to be consistent
ORA-01110: data file 1: '+DG/xxxx/datafile/system.256.752099833'
SQL&gt; alter database datafile 42 offline;
alter database datafile 43 offline;
Database altered.
SQL&gt; 
Database altered.
SQL&gt;
SQL&gt; alter database open resetlogs;
alter database open resetlogs
*
ERROR at line 1:
ORA-01092: ORACLE instance terminated. Disconnection forced

SQL> recover database using backup controlfile until cancel;

ORA-00279: change 13305808683011 generated at 01/11/2016 21:09:02 needed for

thread 1

ORA-00289: suggestion : +DG/archivelog/arch1_900882531_1_1.log

ORA-00280: change 13305808683011 for thread 1 is in sequence #1

Specify log: {<RET>=suggested | filename | AUTO | CANCEL}

auto

ORA-00308: cannot open archived log '+DG/archivelog/arch1_900882531_1_1.log'

ORA-17503: ksfdopn:2 Failed to open file +DG/archivelog/arch1_900882531_1_1.log

ORA-15173: entry 'arch1_900882531_1_1.log' does not exist in directory

'archivelog'

ORA-00308: cannot open archived log '+DG/archivelog/arch1_900882531_1_1.log'

ORA-17503: ksfdopn:2 Failed to open file +DG/archivelog/arch1_900882531_1_1.log

ORA-15173: entry 'arch1_900882531_1_1.log' does not exist in directory

'archivelog'

ORA-01547: warning: RECOVER succeeded but OPEN RESETLOGS would get error below

ORA-01194: file 1 needs more recovery to be consistent

ORA-01110: data file 1: '+DG/xxxx/datafile/system.256.752099833'

SQL> alter database datafile 42 offline;

alter database datafile 43 offline;

Database altered.

SQL>

Database altered.

SQL>

SQL> alter database open resetlogs;

alter database open resetlogs

ERROR at line 1:

ORA-01092: ORACLE instance terminated. Disconnection forced

我们可以看到open 失败了，对于open失败的情况，我们首先是看alert log，接着10046 trace。

SQL&gt; startup nomount pfile='/tmp/p.ora';
ORACLE instance started.
Total System Global Area 2.1475E+10 bytes
Fixed Size                  2122368 bytes
Variable Size            2399145344 bytes
Database Buffers         1.9059E+10 bytes
Redo Buffers               14651392 bytes
SQL&gt; oradebug setmypid
Statement processed.
SQL&gt; alter database mount;
Database altered.
SQL&gt; recover database;
Media recovery complete.
SQL&gt; alter database open;
alter database open
*
ERROR at line 1:
ORA-01092: ORACLE instance terminated. Disconnection forced
ORA-00704: bootstrap process failure
ORA-00704: bootstrap process failure
ORA-00604: error occurred at recursive SQL level 1
ORA-01555: snapshot too old: rollback segment number 4 with name "_SYSSMU4$"
too small

SQL> startup nomount pfile='/tmp/p.ora';

ORACLE instance started.

Total System Global Area 2.1475E+10 bytes

Fixed Size 2122368 bytes

Variable Size 2399145344 bytes

Database Buffers 1.9059E+10 bytes

Redo Buffers 14651392 bytes

SQL> oradebug setmypid

Statement processed.

SQL> alter database mount;

Database altered.

SQL> recover database;

Media recovery complete.

SQL> alter database open;

alter database open

ERROR at line 1:

ORA-01092: ORACLE instance terminated. Disconnection forced

ORA-00704: bootstrap process failure

ORA-00604: error occurred at recursive SQL level 1

ORA-01555: snapshot too old: rollback segment number 4 with name "_SYSSMU4$"

too small

这里我又屏蔽了undo相关的参数。再次尝试发现错误依旧。再次启动，神奇的事情出现了，SCN居然倒退了？

SQL&gt; startup mount  pfile='/tmp/p.ora';
ORACLE instance started.
Total System Global Area 2.1475E+10 bytes
Fixed Size                  2122368 bytes
Variable Size            2399145344 bytes
Database Buffers         1.9059E+10 bytes
Redo Buffers               14651392 bytes
Database mounted.
SQL&gt; recover database;
Media recovery complete.
SQL&gt; select checkpoint_change#,file# from v$datafile;
CHECKPOINT_CHANGE#      FILE#
--------------------------- ----------
13314398637607          1
13314398637607          2
13314398637607          3
13314398637607          4
13314398637607          5
......
13314398637607         38
13314398637607         39
13314398637607         40
13314398637607         41
0         42
0         43
43 rows selected.
SQL&gt; select checkpoint_change#,file#,checkpoint_time from v$datafile_header;
CHECKPOINT_CHANGE#      FILE# CHECKPOIN
--------------------------- ---------- ---------
13314398637607          1 11-JAN-16
13314398637607          2 11-JAN-16
13314398637607          3 11-JAN-16
13314398637607          4 11-JAN-16
13314398637607          5 11-JAN-16
......
13314398637607         39 11-JAN-16
13314398637607         40 11-JAN-16
13314398637607         41 11-JAN-16
14731601024328         42 30-DEC-15
14731601024328         43 30-DEC-15
43 rows selected.

SQL> startup mount pfile='/tmp/p.ora';

ORACLE instance started.

Total System Global Area 2.1475E+10 bytes

Fixed Size 2122368 bytes

Variable Size 2399145344 bytes

Database Buffers 1.9059E+10 bytes

Redo Buffers 14651392 bytes

Database mounted.

SQL> recover database;

Media recovery complete.

SQL> select checkpoint_change#,file# from v$datafile;

CHECKPOINT_CHANGE# FILE#

--------------------------- ----------

13314398637607 1

13314398637607 2

13314398637607 3

13314398637607 4

13314398637607 5

......

13314398637607 38

13314398637607 39

13314398637607 40

13314398637607 41

0 42

0 43

43 rows selected.

SQL> select checkpoint_change#,file#,checkpoint_time from v$datafile_header;

CHECKPOINT_CHANGE# FILE# CHECKPOIN

--------------------------- ---------- ---------

13314398637607 1 11-JAN-16

13314398637607 2 11-JAN-16

13314398637607 3 11-JAN-16

13314398637607 4 11-JAN-16

13314398637607 5 11-JAN-16

......

13314398637607 39 11-JAN-16

13314398637607 40 11-JAN-16

13314398637607 41 11-JAN-16

14731601024328 42 30-DEC-15

14731601024328 43 30-DEC-15

43 rows selected.

很明显，这个133的scn 回退到了过去2年前了，出现时空穿越了。。。。当然，open肯定还是报错：

SQL&gt; alter database open;
alter database open
*
ERROR at line 1:
ORA-01092: ORACLE instance terminated. Disconnection forced
ORA-00704: bootstrap process failure
ORA-00704: bootstrap process failure
ORA-00604: error occurred at recursive SQL level 1
ORA-01555: snapshot too old: rollback segment number 4 with name "_SYSSMU4$"
too small

SQL> alter database open;

alter database open

ERROR at line 1:

ORA-01092: ORACLE instance terminated. Disconnection forced

ORA-00704: bootstrap process failure

ORA-00604: error occurred at recursive SQL level 1

ORA-01555: snapshot too old: rollback segment number 4 with name "_SYSSMU4$"

too small

这里先不管为啥连数据文件头的SCN都倒退了(之前被offline的2个文件scn是OK的). 通过10046 trace得到如下内容:

PARSING IN CURSOR #5 len=52 dep=1 uid=0 oct=3 lid=0 tim=1418474357830663 hv=429618617 ad='5db7ea50'
select ctime, mtime, stime from obj$ where obj# = :1
END OF STMT
PARSE #5:c=0,e=531,p=0,cr=0,cu=0,mis=1,r=0,dep=1,og=4,tim=1418474357830659
BINDS #5:
kkscoacd
Bind#0
oacdty=02 mxl=22(22) mxlc=00 mal=00 scl=00 pre=00
oacflg=08 fl2=0001 frm=00 csi=00 siz=24 off=0
kxsbbbfp=2ad7172aa020  bln=22  avl=02  flg=05
value=20
EXEC #5:c=1000,e=673,p=0,cr=0,cu=0,mis=1,r=0,dep=1,og=4,tim=1418474357831431
WAIT #5: nam='db file sequential read' ela= 9843 file#=1 block#=218 blocks=1 obj#=-1 tim=1418474357841421
.....
FETCH #6:c=17997,e=64936,p=23,cr=566,cu=0,mis=0,r=1,dep=2,og=3,tim=1418474357907669
STAT #6 id=1 cnt=1 pid=0 pos=1 obj=15 op='TABLE ACCESS BY INDEX ROWID UNDO$ (cr=566 pr=23 pw=0 time=64913 us)'
STAT #6 id=2 cnt=1 pid=1 pos=1 obj=34 op='INDEX UNIQUE SCAN I_UNDO1 (cr=1 pr=1 pw=0 time=5769 us)'
WAIT #5: nam='db file sequential read' ela= 13031 file#=40 block#=167538 blocks=1 obj#=-1 tim=1418474357920819
FETCH #5:c=19996,e=89548,p=25,cr=568,cu=0,mis=0,r=0,dep=1,og=4,tim=1418474357921006

PARSING IN CURSOR #5 len=52 dep=1 uid=0 oct=3 lid=0 tim=1418474357830663 hv=429618617 ad='5db7ea50'

select ctime, mtime, stime from obj$ where obj# = :1

END OF STMT

PARSE #5:c=0,e=531,p=0,cr=0,cu=0,mis=1,r=0,dep=1,og=4,tim=1418474357830659

BINDS #5:

kkscoacd

Bind#0

oacdty=02 mxl=22(22) mxlc=00 mal=00 scl=00 pre=00

oacflg=08 fl2=0001 frm=00 csi=00 siz=24 off=0

kxsbbbfp=2ad7172aa020 bln=22 avl=02 flg=05

value=20

EXEC #5:c=1000,e=673,p=0,cr=0,cu=0,mis=1,r=0,dep=1,og=4,tim=1418474357831431

WAIT #5: nam='db file sequential read' ela= 9843 file#=1 block#=218 blocks=1 obj#=-1 tim=1418474357841421

.....

FETCH #6:c=17997,e=64936,p=23,cr=566,cu=0,mis=0,r=1,dep=2,og=3,tim=1418474357907669

STAT #6 id=1 cnt=1 pid=0 pos=1 obj=15 op='TABLE ACCESS BY INDEX ROWID UNDO$ (cr=566 pr=23 pw=0 time=64913 us)'

STAT #6 id=2 cnt=1 pid=1 pos=1 obj=34 op='INDEX UNIQUE SCAN I_UNDO1 (cr=1 pr=1 pw=0 time=5769 us)'

WAIT #5: nam='db file sequential read' ela= 13031 file#=40 block#=167538 blocks=1 obj#=-1 tim=1418474357920819

FETCH #5:c=19996,e=89548,p=25,cr=568,cu=0,mis=0,r=0,dep=1,og=4,tim=1418474357921006

我们这里可以看到，这里报错的SQL读取了file 1 block 218,以及file 40 block 167538。
对于file 1 block 218,我dump 发现没有活动事务；而file 40 block 167538则为undo 块.

SQL&gt; select name from v$datafile where file#=40;
NAME
--------------------------------------------------------------------------------
+DG/xxxx/datafile/undotbs4

SQL> select name from v$datafile where file#=40;

NAME

--------------------------------------------------------------------------------

+DG/xxxx/datafile/undotbs4

同时dump 了这个undo 块，发现确实感觉有些异常,如下所示:

********************************************************************************
UNDO BLK:
xid: 0x0009.01b.0014320a  seq: 0xa47 cnt: 0x1   irb: 0x1   icl: 0x0   flg: 0x0000
Rec Offset      Rec Offset      Rec Offset      Rec Offset      Rec Offset
---------------------------------------------------------------------------
0x01 0x0014     
*-----------------------------
* Rec #0x1  slt: 0x1b  objn: 55417(0x0000d879)  objd: 296039  tblspc: 20(0x00000014)
*       Layer:  10 (Index)   opc: 21   rci 0x00
Undo type:  Regular undo   Last buffer split:  No
Temp Object:  No
Tablespace Undo:  No
rdba: 0x0a028e71
*-----------------------------
index general undo (branch) operations
KTB Redo
op: 0x05  ver: 0x01
op: R  itc: 2
Itl           Xid                  Uba         Flag  Lck        Scn/Fsc
0x01   0x0009.01b.0014320a  0x0a028e71.0a47.04  ----    1  fsc 0x0000.00000000
0x02   0x0009.02b.001428b6  0x0a028e6f.0a47.06  ----  112  fsc 0x0000.00000000
Dump kdige : block dba :0x05d630b3, seghdr dba: 0x06076e89
restore block before image

********************************************************************************

UNDO BLK:

xid: 0x0009.01b.0014320a seq: 0xa47 cnt: 0x1 irb: 0x1 icl: 0x0 flg: 0x0000

Rec Offset Rec Offset Rec Offset Rec Offset Rec Offset

---------------------------------------------------------------------------

0x01 0x0014

*-----------------------------

* Rec #0x1 slt: 0x1b objn: 55417(0x0000d879) objd: 296039 tblspc: 20(0x00000014)

* Layer: 10 (Index) opc: 21 rci 0x00

Undo type: Regular undo Last buffer split: No

Temp Object: No

Tablespace Undo: No

rdba: 0x0a028e71

*-----------------------------

index general undo (branch) operations

KTB Redo

op: 0x05 ver: 0x01

op: R itc: 2

Itl Xid Uba Flag Lck Scn/Fsc

0x01 0x0009.01b.0014320a 0x0a028e71.0a47.04 ---- 1 fsc 0x0000.00000000

0x02 0x0009.02b.001428b6 0x0a028e6f.0a47.06 ---- 112 fsc 0x0000.00000000

Dump kdige : block dba :0x05d630b3, seghdr dba: 0x06076e89

restore block before image

由于所有的文件头SCN 都倒退了，正常open 都报错，只能推进SCN，而且SCN必须要比这个undo block的最大SCN 还要大一些才行,通过在pfile文件中加入参数*._minimum_giga_scn即可。

SQL&gt; conn /as sysdba
Connected to an idle instance.
SQL&gt; startup mount pfile='/tmp/p.ora';
ORACLE instance started.
Total System Global Area 2.1475E+10 bytes
Fixed Size                  2122368 bytes
Variable Size            2399145344 bytes
Database Buffers         1.9059E+10 bytes
Redo Buffers               14651392 bytes
Database mounted.
SQL&gt; show parameter job
NAME                                 TYPE        VALUE
------------------------------------ ----------- ------------------------------
job_queue_processes                  integer     10
SQL&gt; alter system set job_queue_processes=0;
System altered.
SQL&gt; alter database open;
Database altered.
SQL&gt; drop tablespace undotbs3 including contents and datafiles;
Tablespace dropped.
SQL&gt; drop tablespace undotbs4 including contents and datafiles;
Tablespace dropped.

SQL> conn /as sysdba

Connected to an idle instance.

SQL> startup mount pfile='/tmp/p.ora';

ORACLE instance started.

Total System Global Area 2.1475E+10 bytes

Fixed Size 2122368 bytes

Variable Size 2399145344 bytes

Database Buffers 1.9059E+10 bytes

Redo Buffers 14651392 bytes

Database mounted.

SQL> show parameter job

NAME TYPE VALUE

------------------------------------ ----------- ------------------------------

job_queue_processes integer 10

SQL> alter system set job_queue_processes=0;

System altered.

SQL> alter database open;

Database altered.

SQL> drop tablespace undotbs3 including contents and datafiles;

Tablespace dropped.

SQL> drop tablespace undotbs4 including contents and datafiles;

Tablespace dropped.

顺利打开数据库之后，立即将原有的undo 表空间进行drop 并重建。
虽然数据库是打开了，然而其中有2个数据文件之前被我们offline了，而且中间进行了resetlogs操作，因此现在无法进行正常online了。

SQL&gt; select file#,checkpoint_change# from v$datafile;
FILE#             CHECKPOINT_CHANGE#
---------- ------------------------------
1                 14759124431097
2                 14759124431097
3                 14759124431097
4                 14759124431097
5                 14759124431097
......
36                 14759124431097
37                 14759124431097
38                 14759124431097
41                 14759124431097
42                              0
43                              0
44                 14759124431097
45                 14759124431097
43 rows selected.
SQL&gt; alter database datafile 42 online;
alter database datafile 42 online
*
ERROR at line 1:
ORA-01190: control file or data file 42 is from before the last RESETLOGS
ORA-01110: data file 42: '+DG/xxxx/datafile/file_tab_xxx03.ora'
SQL&gt; alter database datafile 43 online;
alter database datafile 43 online
*
ERROR at line 1:
ORA-01190: control file or data file 43 is from before the last RESETLOGS
ORA-01110: data file 43: '+DG/xxxx/datafile/file_tab1_xxx05.ora'

SQL> select file#,checkpoint_change# from v$datafile;

FILE# CHECKPOINT_CHANGE#

---------- ------------------------------

1 14759124431097

2 14759124431097

3 14759124431097

4 14759124431097

5 14759124431097

......

36 14759124431097

37 14759124431097

38 14759124431097

41 14759124431097

42 0

43 0

44 14759124431097

45 14759124431097

43 rows selected.

SQL> alter database datafile 42 online;

alter database datafile 42 online

ERROR at line 1:

ORA-01190: control file or data file 42 is from before the last RESETLOGS

ORA-01110: data file 42: '+DG/xxxx/datafile/file_tab_xxx03.ora'

SQL> alter database datafile 43 online;

alter database datafile 43 online

ERROR at line 1:

ORA-01190: control file or data file 43 is from before the last RESETLOGS

ORA-01110: data file 43: '+DG/xxxx/datafile/file_tab1_xxx05.ora'

这里用bbed 将上面2个文件头相关信息修改掉，然后进行recover,可以顺利online文件。

SQL&gt; recover datafile 42;
Media recovery complete.
SQL&gt; alter database datafile 42 online;
Database altered.
SQL&gt; recover datafile 43;
Media recovery complete.
SQL&gt; alter database datafile 43 online;
Database altered.
SQL&gt; select file#,checkpoint_change# ,status from v$datafile;
FILE#             CHECKPOINT_CHANGE# STATUS
---------- ------------------------------ -------
1                 14759124821491 SYSTEM
2                 14759124821491 SYSTEM
3                 14759124821491 ONLINE
4                 14759124821491 ONLINE
5                 14759124821491 ONLINE
。。。。。。
36                 14759124821491 ONLINE
37                 14759124821491 ONLINE
38                 14759124821491 ONLINE
41                 14759124821491 ONLINE
42                 14759124831966 ONLINE
43                 14759124832115 ONLINE
44                 14759124821491 ONLINE
45                 14759124821491 ONLINE
43 rows selected.
SQL&gt; alter system checkpoint;
System altered.
SQL&gt; select file#,checkpoint_change# ,status from v$datafile;
FILE#             CHECKPOINT_CHANGE# STATUS
---------- ------------------------------ -------
1                 14759124832224 SYSTEM
2                 14759124832224 SYSTEM
3                 14759124832224 ONLINE
4                 14759124832224 ONLINE
5                 14759124832224 ONLINE
......
38                 14759124832224 ONLINE
41                 14759124832224 ONLINE
42                 14759124832224 ONLINE
43                 14759124832224 ONLINE
44                 14759124832224 ONLINE
45                 14759124832224 ONLINE
43 rows selected.

SQL> recover datafile 42;

Media recovery complete.

SQL> alter database datafile 42 online;

Database altered.

SQL> recover datafile 43;

Media recovery complete.

SQL> alter database datafile 43 online;

Database altered.

SQL> select file#,checkpoint_change# ,status from v$datafile;

FILE# CHECKPOINT_CHANGE# STATUS

---------- ------------------------------ -------

1 14759124821491 SYSTEM

2 14759124821491 SYSTEM

3 14759124821491 ONLINE

4 14759124821491 ONLINE

5 14759124821491 ONLINE

。。。。。。

36 14759124821491 ONLINE

37 14759124821491 ONLINE

38 14759124821491 ONLINE

41 14759124821491 ONLINE

42 14759124831966 ONLINE

43 14759124832115 ONLINE

44 14759124821491 ONLINE

45 14759124821491 ONLINE

43 rows selected.

SQL> alter system checkpoint;

System altered.

SQL> select file#,checkpoint_change# ,status from v$datafile;

FILE# CHECKPOINT_CHANGE# STATUS

---------- ------------------------------ -------

1 14759124832224 SYSTEM

2 14759124832224 SYSTEM

3 14759124832224 ONLINE

4 14759124832224 ONLINE

5 14759124832224 ONLINE

......

38 14759124832224 ONLINE

41 14759124832224 ONLINE

42 14759124832224 ONLINE

43 14759124832224 ONLINE

44 14759124832224 ONLINE

45 14759124832224 ONLINE

43 rows selected.

最后建议将数据库expdp 导出并重建。到此告一段落！

Tags: 10.2.0.4 rac asm recover, kclchkblk_4, kffbAddBlk04, ORA-00604, ORA-00704, ORA-01110, ORA-01190, ORA-01248, ORA-01276, ORA-01547, ora-01555, [kccpb_sanity_check_2]
Posted in backup & rcovery on 1月 20, 2016

You must be logged in to post a comment.

love wife love life —Roger的Oracle/MySQL/PostgreSQL数据恢复博客

Categories

Archives

最新评论

国内圈子

oracle security

数套ASM RAC的恢复案例

Leave a Reply