非归档遭遇ora-00600 [kcratr_nab_less_than_odr]的恢复
本站文章除注明转载外,均为本站原创: 转载自love wife love life —Roger的Oracle/MySQL/PostgreSQL数据恢复博客
刚刚很早之前认识的一个朋友说他们公司的开发库挂了,让我QQ远程帮忙看看,花了10分钟解决了了这个问题,分享一下.
主要遇到了如下几个问题:
1. mount 发现控制文件异常,通过替换,用pfile mount成功,这个不说了.
2. open报了一个如下的错误:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 |
Fri Jul 04 20:03:23 2014 alter database open Beginning crash recovery of 1 threads parallel recovery started with 15 processes Started redo scan Completed redo scan read 229 KB redo, 0 data blocks need recovery Errors in file d:\app\administrator\diag\rdbms\yunhaoorcl\yunhaoorcl\trace\yunhaoorcl_ora_3416.trc (incident=160589): ORA-00600: 内部错误代码, 参数: [kcratr_nab_less_than_odr], [1], [11783], [8], [1181], [], [], [], [], [], [], [] Incident details in: d:\app\administrator\diag\rdbms\yunhaoorcl\yunhaoorcl\incident\incdir_160589\yunhaoorcl_ora_3416_i160589.trc Aborting crash recovery due to error 600 Errors in file d:\app\administrator\diag\rdbms\yunhaoorcl\yunhaoorcl\trace\yunhaoorcl_ora_3416.trc: ORA-00600: 内部错误代码, 参数: [kcratr_nab_less_than_odr], [1], [11783], [8], [1181], [], [], [], [], [], [], [] Errors in file d:\app\administrator\diag\rdbms\yunhaoorcl\yunhaoorcl\trace\yunhaoorcl_ora_3416.trc: ORA-00600: 内部错误代码, 参数: [kcratr_nab_less_than_odr], [1], [11783], [8], [1181], [], [], [], [], [], [], [] ORA-600 signalled during: alter database open... Fri Jul 04 20:03:25 2014 Trace dumping is performing id=[cdmp_20140704200325] Fri Jul 04 20:03:27 2014 Sweep [inc][160589]: completed Sweep [inc2][160589]: completed |
对于这个,比较少见,猜测可能是instance recovery的时候出现问题了。尝试手工recover database:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 |
SQL> recover database; 完成介质恢复。 SQL> archive log list; 数据库日志模式 非存档模式 自动存档 禁用 存档终点 USE_DB_RECOVERY_FILE_DEST 最早的联机日志序列 11781 当前日志序列 11783 SQL> alter database open; alter database open * 第 1 行出现错误: ORA-00600: 内部错误代码, 参数: [kcratr_nab_less_than_odr], [1], [11783], [8], [1181], [], [], [], [], [], [], [] |
手工recover发现不行,看alert log报了一个error,看下对应的trace如下:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 |
*** 2014-07-04 20:03:23.792 Successfully allocated 15 recovery slaves Using 10 overflow buffers per recovery slave Thread 1 checkpoint: logseq 11782, block 2, scn 230294634 cache-low rba: logseq 11782, block 52177 on-disk rba: logseq 11783, block 1181, scn 230308328 start recovery at logseq 11782, block 52177, scn 0 *** 2014-07-04 20:03:24.058 Started writing zeroblks thread 1 seq 11783 blocks 8-15 *** 2014-07-04 20:03:24.058 Completed writing zeroblks thread 1 seq 11783 ==== Redo read statistics for thread 1 ==== Total physical reads (from disk and memory): 4322Kb -- Redo read_disk statistics -- Read rate (ASYNC): 229Kb in 0.11s => 2.04 Mb/sec Longest record: 3Kb, moves: 0/269 (0%) Change moves: 6/101 (5%), moved: 0Mb Longest LWN: 5Kb, moves: 0/115 (0%), moved: 0Mb Last redo scn: 0x0000.0dba37ae (230307758) ---------------------------------------------- ----- Recovery Hash Table Statistics --------- Hash table buckets = 65536 Longest hash chain = 0 Average hash chain = 0/0 = 0.0 Max compares per lookup = 0 Avg compares per lookup = 0/0 = 0.0 ---------------------------------------------- WARNING! Crash recovery of thread 1 seq 11783 is ending at redo block 8 but should not have ended before redo block 1181 |
我们来仔细观察一下这个instance recovery的信息,得到如下的信息:
Thread 1 checkpoint: logseq 11782, block 2, scn 230294634
cache-low rba: logseq 11782, block 52177
on-disk rba: logseq 11783, block 1181, scn 230308328
start recovery at logseq 11782, block 52177, scn 0
线程检查点: logseq 11782, block 2, scn 230294634
low cache rba: logseq 11782, block 52177
on disk rba: logseq 11783, block 1181, scn 230308328
可以看到实例恢复的起点是low cache rba(实际上oracle会比较线程检查点和low cache rba,选择其中的较大者作为实例恢复的启点).
最后我们再来仔细分析下这个错误:
ORA-00600: 内部错误代码, 参数: [kcratr_nab_less_than_odr], [1], [11783], [8],[1181], [], [], [], [], [], [], []
kcratr_nab_less_than_odr: 大胆猜测应该在进行比较某个值
[1] : 应该是指的thread number
[11783]: on disk rba的logseq 值
[8]: 这里未知
[1181]: on disk rba的block号
最后搜了一下MOS,Oracle 给出的解释如下,关于这个ora-00600错误:
This caused a lost Write into the Online RedoLogs and so Instance Recovery is not possible and raising the ORA-600.
通俗一点讲,即online log的写丢失导致实例恢复无法恢复到指定的点,进而抛出这个ora-00600错误。
对于on disk rba,这个是oracle instance recovery必须要达到的值,如果无法恢复到该值,那么将会出现异常,类似这里的问题.
既然明白了这一点,那么恢复就很容易了,首先重建下controlfile:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 |
SQL> CREATE CONTROLFILE REUSE DATABASE "YUNHAOOR" NORESETLOGS NOARCHIVELOG 2 MAXLOGFILES 16 3 MAXLOGMEMBERS 3 4 MAXDATAFILES 100 5 MAXINSTANCES 8 6 MAXLOGHISTORY 1168 7 LOGFILE 8 GROUP 1 'D:\APP\ADMINISTRATOR\ORADATA\YUNHAOORCL\REDO01.LOG' SIZE 50M BLOCKSIZE 512, 9 GROUP 2 'D:\APP\ADMINISTRATOR\ORADATA\YUNHAOORCL\REDO02.LOG' SIZE 50M BLOCKSIZE 512, 10 GROUP 3 'D:\APP\ADMINISTRATOR\ORADATA\YUNHAOORCL\REDO03.LOG' SIZE 50M BLOCKSIZE 512 11 -- STANDBY LOGFILE 12 DATAFILE 13 'D:\APP\ADMINISTRATOR\ORADATA\YUNHAOORCL\SYSTEM01.DBF', 14 'D:\APP\ADMINISTRATOR\ORADATA\YUNHAOORCL\SYSAUX01.DBF', 15 'D:\APP\ADMINISTRATOR\ORADATA\YUNHAOORCL\UNDOTBS01.DBF', 16 'D:\APP\ADMINISTRATOR\ORADATA\YUNHAOORCL\USERS01.DBF', 17 'D:\APP\ADMINISTRATOR\ORADATA\YUNHAOORCL\FENG_SPACE.DBF', 18 'D:\APP\ADMINISTRATOR\ORADATA\YUNHAOORCL\SYSTEM02.DBF', 19 'D:\APP\ADMINISTRATOR\ORADATA\YUNHAOORCL\SYSTEM03.DBF', 20 'D:\APP\ADMINISTRATOR\ORADATA\YUNHAOORCL\USERS2', 21 'D:\APP\ADMINISTRATOR\ORADATA\YUNHAOORCL\SYSAUX1', 22 'D:\APP\ADMINISTRATOR\ORADATA\YUNHAOORCL\SYSTEM1' 23 CHARACTER SET ZHS16GBK 24 ; 控制文件已创建。 SQL> recover database; 完成介质恢复。 |
最后打开数据库,发现又报错了,不过还好,这是一个只要是DBA知道怎么解决的错了,如下:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 |
SQL> alter database open; alter database open * 第 1 行出现错误: ORA-00603: ORACLE server session terminated by fatal error ORA-00600: internal error code, arguments: [4194], [], [], [], [], [], [], [], [], [], [], [] ORA-00600: internal error code, arguments: [4194], [], [], [], [], [], [], [], [], [], [], [] ORA-01092: ORACLE instance terminated. Disconnection forced ORA-00600: internal error code, arguments: [4194], [], [], [], [], [], [], [], [], [], [], [] ORA-00600: internal error code, arguments: [4194], [], [], [], [], [], [], [], [], [], [], [] ORA-00600: internal error code, arguments: [4194], [], [], [], [], [], [], [], [], [], [], [] ORA-00600: internal error code, arguments: [4194], [], [], [], [], [], [], [], [], [], [], [] 进程 ID: 6760 会话 ID: 534 序列号: 1 SQL> |
实际上查看alert log 还看到了ora-00600 [4193]错误。 这个处理方法一样,不累述。
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 |
(incident=171742): ORA-00600: internal error code, arguments: [4193], [], [], [], [], [], [], [], [], [], [], [] Incident details in: d:\app\administrator\diag\rdbms\yunhaoorcl\yunhaoorcl\incident\incdir_171742\yunhaoorcl_smon_3884_i171742.trc No Resource Manager plan active Errors in file d:\app\administrator\diag\rdbms\yunhaoorcl\yunhaoorcl\trace\yunhaoorcl_ora_6760.trc (incident=171790): ORA-00600: 内部错误代码, 参数: [4194], [], [], [], [], [], [], [], [], [], [], [] Incident details in: d:\app\administrator\diag\rdbms\yunhaoorcl\yunhaoorcl\incident\incdir_171790\yunhaoorcl_ora_6760_i171790.trc Fri Jul 04 20:12:34 2014 Trace dumping is performing id=[cdmp_20140704201234] Trace dumping is performing id=[cdmp_20140704201235] Doing block recovery for file 3 block 217 Resuming block recovery (PMON) for file 3 block 217 Block recovery from logseq 11784, block 63 to scn 230347995 Recovery of Online Redo Log: Thread 1 Group 3 Seq 11784 Reading mem 0 Mem# 0: D:\APP\ADMINISTRATOR\ORADATA\YUNHAOORCL\REDO03.LOG Block recovery stopped at EOT rba 11784.67.16 Block recovery completed at rba 11784.67.16, scn 0.230347992 Doing block recovery for file 3 block 144 Resuming block recovery (PMON) for file 3 block 144 Block recovery from logseq 11784, block 63 to scn 230347989 Recovery of Online Redo Log: Thread 1 Group 3 Seq 11784 Reading mem 0 Mem# 0: D:\APP\ADMINISTRATOR\ORADATA\YUNHAOORCL\REDO03.LOG Block recovery completed at rba 11784.65.16, scn 0.230347991 Errors in file d:\app\administrator\diag\rdbms\yunhaoorcl\yunhaoorcl\trace\yunhaoorcl_smon_3884.trc: ORA-01595: error freeing extent (2) of rollback segment (2)) ORA-00600: internal error code, arguments: [4193], [], [], [], [], [], [], [], [], [], [], [] |
对于ora-00600 4193/4194错误,通过pfile指定undo_management=maual,然后启库重建undo即可,如下:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 |
SQL> create undo tablespace undotbs2 datafile 'D:\APP\ADMINISTRATOR\ORADATA\YUNHAOORCL\undotbs2_01.dbf 2 size 4096m; 表空间已创建。 SQL> shutdown immediate 数据库已经关闭。 已经卸载数据库。 ORACLE 例程已经关闭。 SQL> startup mount pfile='D:\1.ora' ORACLE 例程已经启动。 Total System Global Area 9620525056 bytes Fixed Size 2183872 bytes Variable Size 4395633984 bytes Database Buffers 5200936960 bytes Redo Buffers 21770240 bytes 数据库装载完毕。 SQL> alter database open; 数据库已更改。 SQL> show parameter undo NAME TYPE VALUE ------------------------------------ ----------- ------------------------------ undo_management string AUTO undo_retention integer 900 undo_tablespace string UNDOTBS2 SQL> drop tablespace undotbs1 including contents and datafiles; 表空间已删除。 |
备注:对于重建controlfile后,记得添加tempfile,不要给人接埋地雷!
ORA-600 [kcratr_nab_less_than_odr] during Instance Recovery after Database Crash (文档 ID 1299564.1)
Alter database open fails with ORA-00600 kcratr_nab_less_than_odr (文档 ID 1296264.1)
Leave a Reply
You must be logged in to post a comment.