Oracle 12.2 Rac redo异常的恢复案例
本站文章除注明转载外,均为本站原创: 转载自love wife love life —Roger的Oracle/MySQL/PostgreSQL数据恢复博客
本文链接地址: Oracle 12.2 Rac redo异常的恢复案例
前同事问的一个问题;其一套12.2 rac由于存储问题导致无法启动;重建控制文件后报如下错误:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 |
2019-12-05T10:12:55.112874+08:00 Sleep 5 seconds and then try to clear SRLs in 2 time(s) Errors in file /u01/oracle/app/oracle/diag/rdbms/orapuabis/xxoo2/trace/xxoo2_ora_327541.trc (incident=5376496): ORA-00600: internal error code, arguments: [kcbzib_kcrsds_1], [], [], [], [], [], [], [], [], [], [], [] Incident details in: /u01/oracle/app/oracle/diag/rdbms/xxoo/xxoo2/incident/incdir_5376496/xxoo2_ora_327541_i5376496.trc Use ADRCI or Support Workbench to package the incident. See Note 411.1 at My Oracle Support for error and packaging details. 2019-12-05T10:12:56.191277+08:00 ***************************************************************** An internal routine has requested a dump of selected redo. This usually happens following a specific internal error, when analysis of the redo logs will help Oracle Support with the diagnosis. It is recommended that you retain all the redo logs generated (by all the instances) during the past 12 hours, in case additional redo dumps are required to help with the diagnosis. ***************************************************************** 2019-12-05T10:12:56.485111+08:00 Errors in file /u01/oracle/app/oracle/diag/rdbms/xxoo/xxoo2/trace/xxoo2_ora_327541.trc: ORA-00704: bootstrap process failure ORA-00600: internal error code, arguments: [kcbzib_kcrsds_1], [], [], [], [], [], [], [], [], [], [], [] 2019-12-05T10:12:56.485226+08:00 Errors in file /u01/oracle/app/oracle/diag/rdbms/xxoo/xxoo2/trace/xxoo2_ora_327541.trc: ORA-00704: bootstrap process failure ORA-00704: bootstrap process failure ORA-00600: internal error code, arguments: [kcbzib_kcrsds_1], [], [], [], [], [], [], [], [], [], [], [] 2019-12-05T10:12:56.485323+08:00 Errors in file /u01/oracle/app/oracle/diag/rdbms/xxoo/xxoo2/trace/xxoo2_ora_327541.trc: ORA-00704: bootstrap process failure ORA-00704: bootstrap process failure ORA-00600: internal error code, arguments: [kcbzib_kcrsds_1], [], [], [], [], [], [], [], [], [], [], [] Error 704 happened during db open, shutting down database Errors in file /u01/oracle/app/oracle/diag/rdbms/xxoo/xxoo2/trace/xxoo2_ora_327541.trc (incident=5376497): ORA-00603: ORACLE server session terminated by fatal error ORA-01092: ORACLE instance terminated. Disconnection forced ORA-00704: bootstrap process failure ORA-00704: bootstrap process failure ORA-00600: internal error code, arguments: [kcbzib_kcrsds_1], [], [], [], [], [], [], [], [], [], [], [] Incident details in: /u01/oracle/app/oracle/diag/rdbms/xxoo/xxoo2/incident/incdir_5376497/xxoo2_ora_327541_i5376497.trc 2019-12-05T10:12:56.602381+08:00 Dumping diagnostic data in directory=[cdmp_20191205101256], requested by (instance=2, osid=327541), summary=[incident=5376496]. 2019-12-05T10:12:57.664168+08:00 opiodr aborting process unknown ospid (327541) as a result of ORA-603 |
这个600错误我还是第一次遇见。最近2年很少做恢复,因此在华为云服务器针对该错误进行了一次测试;通过模拟事务丢失(dd破坏current redo等)很容易出现该错误;如下是我的模拟过程:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 |
SQL> startup ORACLE instance started. Total System Global Area 3221225472 bytes Fixed Size 8797928 bytes Variable Size 939524376 bytes Database Buffers 2264924160 bytes Redo Buffers 7979008 bytes Database mounted. Database opened. SQL> show pdbs; CON_ID CON_NAME OPEN MODE RESTRICTED ---------- ------------------------------ ---------- ---------- 2 PDB$SEED READ ONLY NO 3 ENMOTEST MOUNTED 4 PDBTEST1 MOUNTED SQL> alter pluggable database ENMOTEST open; Pluggable database altered. SQL> alter pluggable database PDBTEST1 open; alter Pluggable database altered. SQL> SQL> SQL> session set container=enmotest; Session altered. SQL> create table roger as select * from sys.dba_objects where rownum < 10001; Table created. SQL> alter system flush buffer_cache; System altered. SQL> delete from roger where rownum < 1001; 1000 rows deleted. SQL> select * from v$Log; GROUP# THREAD# SEQUENCE# BYTES BLOCKSIZE MEMBERS ARCHIV STATUS FIRST_CHANGE# FIRST_TIME NEXT_CHANGE# NEXT_TIME CON_ID ---------- ---------- ---------- ---------- ---------- ---------- ------ -------------------------------- ------------- ------------ ------------ ------------ ---------- 1 1 37 209715200 512 1 NO INACTIVE 3232261 04-DEC-19 3335566 05-DEC-19 0 2 1 38 209715200 512 1 NO CURRENT 3335566 05-DEC-19 1.8447E+19 0 3 1 36 209715200 512 1 NO INACTIVE 3121301 28-NOV-19 3232261 04-DEC-19 0 SQL> select member from v$Logfile; MEMBER --------------------------------------------------------------------- /sas_data/12c/app/oracle/oradata/test12c/redo03.log /sas_data/12c/app/oracle/oradata/test12c/redo02.log /sas_data/12c/app/oracle/oradata/test12c/redo01.log SQL> conn /as sysdba Connected. SQL> shutdown abort; ORACLE instance shut down. SQL> ! [oracle@enmodb3 ~]$ dd if=/dev/zero of=/sas_data/12c/app/oracle/oradata/test12c/redo02.log bs=1024k count=1 1+0 records in 1+0 records out 1048576 bytes (1.0 MB) copied, 0.00221134 s, 474 MB/s [oracle@enmodb3 ~]$ exit exit SQL> startup ORACLE instance started. Total System Global Area 3221225472 bytes Fixed Size 8797928 bytes Variable Size 939524376 bytes Database Buffers 2264924160 bytes Redo Buffers 7979008 bytes Database mounted. ORA-00313: open failed for members of log group 2 of thread 1 ORA-00312: online log 2 thread 1: '/sas_data/12c/app/oracle/oradata/test12c/redo02.log' ORA-27048: skgfifi: file header information is invalid Additional information: 2 SQL> startup pfile='/tmp/12_2.ora' mount; ORACLE instance started. Total System Global Area 3221225472 bytes Fixed Size 8797928 bytes Variable Size 939524376 bytes Database Buffers 2264924160 bytes Redo Buffers 7979008 bytes Database mounted. SQL> oradebug setmypid Statement processed. SQL> alter database backup controlfile to trace; Database altered. SQL> oradebug tracefile_name /sas_data/12c/app/oracle/diag/rdbms/test12c/test12c/trace/test12c_ora_30127.trc 在重建控制文件后,分区进行强制open时分别遇到了如下错误: SQL> alter database open resetlogs; ERROR: ORA-03114: not connected to ORACLE alter database open resetlogs * ERROR at line 1: ORA-00603: ORACLE server session terminated by fatal error ORA-01092: ORACLE instance terminated. Disconnection forced ORA-00704: bootstrap process failure ORA-00704: bootstrap process failure ORA-00600: internal error code, arguments: [kcbzib_kcrsds_1], [], [], [], [], [], [], [], [], [], [], [] Process ID: 13970 Session ID: 1 Serial number: 15896 SQL> alter pluggable database ENMOTEST open force; alter pluggable database ENMOTEST open force * ERROR at line 1: ORA-00600: internal error code, arguments: [kcffo_online_pdb_check: fno_system], [3], [], [], [], [], [], [], [], [], [], [] |
我们言归正传;回到本文的案例中来;针对第一个600错误的处理方式是,首先通过10046 trace定位到访问的数据块;然后通过bbed修改相关block,提交事务后,再使用event=”21307096 trace name context forever, level 1′ 来 推进scn即可。
当然,有可能在你在恢复过程中还会遇到ora-00600 [4194]等错误,这是跟undo相关的问题,可以通过屏蔽回滚段或者重建undo来解决。
当处理完毕后,强烈建议使用hcheck脚本检查数据字典的一致性,如果检查没有任何问题,那么基本上数据库运行不会有太大问题;如下是本case的检测结果:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 |
SQL> @/home/oracle/hcheck.sql HCheck Version 07MAY18 on 05-DEC-2019 11:28:59 ---------------------------------------------- Catalog Version 12.2.0.1.0 (1202000100) db_name: xxxx Is CDB?: NO Catalog Fixed Procedure Name Version Vs Release Timestamp Result ------------------------------ ... ---------- -- ---------- -------------- ------ .- LobNotInObj ... 1202000100 <= *All Rel* 12/05 11:28:59 PASS .- MissingOIDOnObjCol ... 1202000100 <= *All Rel* 12/05 11:28:59 PASS .- SourceNotInObj ... 1202000100 <= *All Rel* 12/05 11:28:59 PASS .- OversizedFiles ... 1202000100 <= *All Rel* 12/05 11:28:59 PASS .- PoorDefaultStorage ... 1202000100 <= *All Rel* 12/05 11:28:59 PASS .- PoorStorage ... 1202000100 <= *All Rel* 12/05 11:28:59 PASS .- TabPartCountMismatch ... 1202000100 <= *All Rel* 12/05 11:28:59 PASS .- OrphanedTabComPart ... 1202000100 <= *All Rel* 12/05 11:28:59 PASS .- MissingSum$ ... 1202000100 <= *All Rel* 12/05 11:28:59 PASS .- MissingDir$ ... 1202000100 <= *All Rel* 12/05 11:28:59 PASS .- DuplicateDataobj ... 1202000100 <= *All Rel* 12/05 11:28:59 PASS .- ObjSynMissing ... 1202000100 <= *All Rel* 12/05 11:29:00 PASS .- ObjSeqMissing ... 1202000100 <= *All Rel* 12/05 11:29:00 PASS .- OrphanedUndo ... 1202000100 <= *All Rel* 12/05 11:29:00 PASS .- OrphanedIndex ... 1202000100 <= *All Rel* 12/05 11:29:00 PASS .- OrphanedIndexPartition ... 1202000100 <= *All Rel* 12/05 11:29:00 PASS .- OrphanedIndexSubPartition ... 1202000100 <= *All Rel* 12/05 11:29:00 PASS .- OrphanedTable ... 1202000100 <= *All Rel* 12/05 11:29:00 PASS .- OrphanedTablePartition ... 1202000100 <= *All Rel* 12/05 11:29:00 PASS .- OrphanedTableSubPartition ... 1202000100 <= *All Rel* 12/05 11:29:00 PASS .- MissingPartCol ... 1202000100 <= *All Rel* 12/05 11:29:00 PASS .- OrphanedSeg$ ... 1202000100 <= *All Rel* 12/05 11:29:00 PASS .- OrphanedIndPartObj# ... 1202000100 <= *All Rel* 12/05 11:29:00 PASS .- DuplicateBlockUse ... 1202000100 <= *All Rel* 12/05 11:29:00 PASS .- FetUet ... 1202000100 <= *All Rel* 12/05 11:29:00 PASS .- Uet0Check ... 1202000100 <= *All Rel* 12/05 11:29:00 PASS .- SeglessUET ... 1202000100 <= *All Rel* 12/05 11:29:00 PASS .- BadInd$ ... 1202000100 <= *All Rel* 12/05 11:29:00 PASS .- BadTab$ ... 1202000100 <= *All Rel* 12/05 11:29:00 PASS .- BadIcolDepCnt ... 1202000100 <= *All Rel* 12/05 11:29:00 PASS .- ObjIndDobj ... 1202000100 <= *All Rel* 12/05 11:29:00 PASS .- TrgAfterUpgrade ... 1202000100 <= *All Rel* 12/05 11:29:00 PASS .- ObjType0 ... 1202000100 <= *All Rel* 12/05 11:29:00 PASS .- BadOwner ... 1202000100 <= *All Rel* 12/05 11:29:00 PASS .- StmtAuditOnCommit ... 1202000100 <= *All Rel* 12/05 11:29:00 PASS .- BadPublicObjects ... 1202000100 <= *All Rel* 12/05 11:29:00 PASS .- BadSegFreelist ... 1202000100 <= *All Rel* 12/05 11:29:00 PASS .- BadDepends ... 1202000100 <= *All Rel* 12/05 11:29:00 PASS .- CheckDual ... 1202000100 <= *All Rel* 12/05 11:29:01 PASS .- ObjectNames ... 1202000100 <= *All Rel* 12/05 11:29:01 PASS .- BadCboHiLo ... 1202000100 <= *All Rel* 12/05 11:29:01 PASS .- ChkIotTs ... 1202000100 <= *All Rel* 12/05 11:29:01 PASS .- NoSegmentIndex ... 1202000100 <= *All Rel* 12/05 11:29:01 PASS .- BadNextObject ... 1202000100 <= *All Rel* 12/05 11:29:01 PASS .- DroppedROTS ... 1202000100 <= *All Rel* 12/05 11:29:01 PASS .- FilBlkZero ... 1202000100 <= *All Rel* 12/05 11:29:01 PASS .- DbmsSchemaCopy ... 1202000100 <= *All Rel* 12/05 11:29:01 PASS .- OrphanedIdnseqObj ... 1202000100 > 1201000000 12/05 11:29:01 PASS .- OrphanedIdnseqSeq ... 1202000100 > 1201000000 12/05 11:29:01 PASS .- OrphanedObjError ... 1202000100 > 1102000000 12/05 11:29:01 PASS .- ObjNotLob ... 1202000100 <= *All Rel* 12/05 11:29:01 PASS .- MaxControlfSeq ... 1202000100 <= *All Rel* 12/05 11:29:01 PASS .- SegNotInDeferredStg ... 1202000100 > 1102000000 12/05 11:29:01 PASS .- SystemNotRfile1 ... 1202000100 > 902000000 12/05 11:29:01 PASS .- DictOwnNonDefaultSYSTEM ... 1202000100 <= *All Rel* 12/05 11:29:01 PASS .- OrphanTrigger ... 1202000100 <= *All Rel* 12/05 11:29:01 PASS .- ObjNotTrigger ... 1202000100 <= *All Rel* 12/05 11:29:01 PASS --------------------------------------- 05-DEC-2019 11:29:01 Elapsed: 2 secs --------------------------------------- Found 0 potential problem(s) and 0 warning(s) PL/SQL procedure successfully completed. |
第一次遇到这个问题;很久没搞Oracle了。友情支持;就记录一下吧。
话说这里我自己在华为云12.2 版本测试所遇到的ora-00600 kcffo_online_pdb_check: fno_system 问题;还没找到很好的解决方法;放到下一篇文章吧。
Leave a Reply
You must be logged in to post a comment.