Oracle Crash due to Controlfile sequence# reached 0xffffffff
本站文章除注明转载外,均为本站原创: 转载自love wife love life —Roger的Oracle/MySQL/PostgreSQL数据恢复博客
本文链接地址: Oracle Crash due to Controlfile sequence# reached 0xffffffff
近期某客户的一个Oracle数据库的controlfile sequence#居然达到了最大值(42亿 即 0xffffffff);导致数据库直接crash,然后再次启动时发现报错;
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 |
Sun Aug 08 08:21:58 2021 Thread 1 advanced to log sequence 662736 (LGWR switch) Current log# 5 seq# 662736 mem# 0: +DATADG/xxxx/onlinelog/group_5.401.961749619 Sun Aug 08 08:22:20 2021 ************************************************************************* ATTENTION: The controlfile sequence# has now reached 4294967295 (0xffffffff), which is the architectural limit. Further controlfile updates are no longer possible. To resume normal database operation it is necessary to shutdown abort all instances and perform the steps described in Doc ID 20324049.8 at My Oracle Support to reset the controlfile sequence# to 1. ************************************************************************* Errors in file /u03/db/diag/rdbms/xxxx/xxxx1/trace/xxxx1_ora_20578722.trc (incident=843103): ORA-00600: internal error code, arguments: [kccfhb_3], [4294967295], [], [], [], [], [], [], [], [], [], [] Incident details in: /u03/db/diag/rdbms/xxxx/xxxx1/incident/incdir_843103/xxxx1_ora_20578722_i843103.trc Sun Aug 08 08:22:22 2021 Dumping diagnostic data in directory=[cdmp_20210808082222], requested by (instance=1, osid=20578722), summary=[incident=843103]. Use ADRCI or Support Workbench to package the incident. See Note 411.1 at My Oracle Support for error and packaging details. ...... ORA-1092 : opitsk aborting process Sun Aug 08 08:22:41 2021 ORA-1092 : opitsk aborting process Instance terminated by CKPT, pid = 23068730 Sun Aug 08 08:22:44 2021 ...... Sun Aug 08 08:23:08 2021 ALTER DATABASE MOUNT /* db agent *//* {0:9:46000} */ This instance was first to mount Sun Aug 08 08:23:08 2021 Reconfiguration complete ************************************************************************* ATTENTION: The controlfile sequence# has now reached 4294967295 (0xffffffff), which is the architectural limit. Further controlfile updates are no longer possible. To resume normal database operation it is necessary to shutdown abort all instances and perform the steps described in Doc ID 20324049.8 at My Oracle Support to reset the controlfile sequence# to 1. ************************************************************************* ************************************************************************* ATTENTION: The controlfile sequence# has now reached 4294967295 (0xffffffff), which is the architectural limit. Further controlfile updates are no longer possible. To resume normal database operation it is necessary to shutdown abort all instances and perform the steps described in Doc ID 20324049.8 at My Oracle Support to reset the controlfile sequence# to 1. ************************************************************************* Errors in file /u03/db/diag/rdbms/xxxx/xxxx1/trace/xxxx1_ora_51380254.trc (incident=864827): ORA-00600: internal error code, arguments: [kccfhb_3], [4294967295], [], [], [], [], [], [], [], [], [], [] Incident details in: /u03/db/diag/rdbms/xxxx/xxxx1/incident/incdir_864827/xxxx1_ora_51380254_i864827.trc Use ADRCI or Support Workbench to package the incident. See Note 411.1 at My Oracle Support for error and packaging details. Sun Aug 08 08:23:14 2021 Dumping diagnostic data in directory=[cdmp_20210808082314], requested by (instance=1, osid=51380254), summary=[incident=864827]. ORA-600 signalled during: ALTER DATABASE MOUNT /* db agent *//* {0:9:46000} */... Sun Aug 08 08:23:15 2021 Shutting down instance (abort) License high water mark = 2 USER (ospid: 49610866): terminating the instance Instance terminated by USER, pid = 49610866 |
从上述Oracle alert log来看,Oracle提到了一个文档,查看该文档会发现,Oracle认为该问题是Bug导致,同时也提供了解决方案;其解决方案也不复杂,如下:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 |
1. verify the controlfile sequence# is at or above 0xFF000000: set numwidth 15 select max(FHCSQ) from x$kcvfh; 2. Generate Trace file to recreate the controlfile: alter database backup controlfile to trace noresetlogs; 3. shutdown 4. startup nomount 5. alter session set events '20324049 trace name context forever, level 1'; 6. execute the commands in the tracefile generated by step#2 7. alter session set events '20324049 trace name context off'; 8. confirm the controlfile sequence# is now low with the same query in 1. |
简单的讲,即先设置event 20324049;然后再重建controlfile,直接open即可;但是在进行重建controlfile时遇到了如下问题:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 |
but not in the data dictionary. Deleting from controlfile. Errors in file /u03/db/diag/rdbms/xxxx/xxxx1/trace/xxxx1_ora_7930228.trc (incident=936822): ORA-00600: internal error code, arguments: [25016], [193], [52], [], [], [], [], [], [], [], [], [] Incident details in: /u03/db/diag/rdbms/xxxx/xxxx1/incident/incdir_936822/xxxx1_ora_7930228_i936822.trc Sun Aug 08 13:43:50 2021 Dumping diagnostic data in directory=[cdmp_20210808134350], requested by (instance=1, osid=7930228), summary=[incident=936822]. Use ADRCI or Support Workbench to package the incident. See Note 411.1 at My Oracle Support for error and packaging details. Errors in file /u03/db/diag/rdbms/xxxx/xxxx1/trace/xxxx1_ora_7930228.trc: ORA-00600: internal error code, arguments: [25016], [193], [52], [], [], [], [], [], [], [], [], [] Errors in file /u03/db/diag/rdbms/xxxx/xxxx1/trace/xxxx1_ora_7930228.trc: ORA-00600: internal error code, arguments: [25016], [193], [52], [], [], [], [], [], [], [], [], [] Error 600 happened during db open, shutting down database USER (ospid: 7930228): terminating the instance due to error 600 Instance terminated by USER, pid = 7930228 ORA-1092 signalled during: alter database open resetlogs... opiodr aborting process unknown ospid (7930228) as a result of ORA-1092 Sun Aug 08 13:43:54 2021 |
上述错误提示其实是较为明显的,大概含义是指open数据库时发现有表空间信息丢失。不难看出该数据库在之前应该drop 过一些tablespace。在重建控制文件之后,打开数据库时,表空间的ts#必须是连续的,否则会抛出上述错误。那么怎么解决该问题呢?
我们有个大致的思路,先想办法把丢失的几条记录补充到ts$即可。这里我们通过dbx -a PID来进行操作,设置kokiasg 断点,insert 丢失的5条记录之后,再重启数据库即可。重启之后open数据库发现报ora-00600 [2662];该错误就非常简单了,这里不再多说。oradebug poke 推进scn即可。
最后再重启数据库,去重建一次controlfile,设置event 处理max sequence#的问题;在alert log中会看到如下类似信息:
1 2 3 4 5 6 7 8 9 10 11 12 |
CREATE CONTROLFILE is being performed with event 20324049 set, this will now reset the controlfile sequence number to 1 in the datafiles, the online redo logs, and the new controlfile. datafile 1 header updated: controlfile sequence# reset to 1 NOTE: Loaded library: System SUCCESS: diskgroup DATADG was mounted datafile 2 header updated: controlfile sequence# reset to 1 datafile 3 header updated: controlfile sequence# reset to 1 datafile 4 header updated: controlfile sequence# reset to 1 datafile 5 header updated: controlfile sequence# reset to 1 datafile 6 header updated: controlfile sequence# reset to 1 datafile 7 header updated: controlfile sequence# reset to 1 datafile 8 header updated: controlfile sequence# reset to 1 datafile 9 header updated: controlfile sequence# reset to 1 |
最后打开数据库即可。至此该case也算是完结了。总的来讲,整个恢复过程不算太复杂。
因为比较少见,我本人也是第一次遇到将controlfile sequence#撑满的情况,比较有趣,所以简单记录一下,跟大家分享!
Leave a Reply
You must be logged in to post a comment.