记一次非归档RAC恢复(大量的ORA-600 错误)
本站文章除注明转载外,均为本站原创: 转载自love wife love life —Roger的Oracle/MySQL/PostgreSQL数据恢复博客
本文链接地址: 记一次非归档RAC恢复(大量的ORA-600 错误)
1 |
帮客户恢复了一个10205 rac(for linux),非归档,掉电(听说频繁掉电几十次),下面是处理过程: |
、
1 2 3 4 5 6 7 8 9 10 |
SQL> startup mount pfile='/home/oracle/pfile.ora'; ORACLE instance started. Total System Global Area 4294967296 bytes Fixed Size 2101736 bytes Variable Size 805309976 bytes Database Buffers 3456106496 bytes Redo Buffers 31449088 bytes ORA-00214: control file '/u01/oradata/e200pro/control02.ctl' version 1483026 inconsistent with file '/u01/oradata/e200pro/control01.ctl' version 1482441 |
1 2 3 4 5 6 7 8 |
通常遇到这个情况,我们采取的方法是分别control01,control02,control03去进行 尝试性mount,不过我这里发现都不行,还会抛出如下错误: ORA-00600: internal error code, arguments: [kccpb_sanity_check_2], [1484399], [1483026], [0x000000000], [], [], [], [] 无奈之下只能进行controlfile 的重建,开始头晕晕的,重建controlfile的时候少加了 一些datafile,以至于后面进行了resetlogs等操作以后,想再次进行controlfile的重建 发现就不行了,报如下类似错误: |
1 2 3 4 5 6 |
CREATE CONTROLFILE REUSE DATABASE "E200PRO" RESETLOGS NOARCHIVELOG * ERROR at line 1: ORA-01503: CREATE CONTROLFILE failed ORA-01189: file is from a different RESETLOGS than previous files ORA-01110: data file 21: '/u02/oradata/datafile/archind1.dbf' |
1 2 3 |
我知道可以通过其他手段去修改datafile header然后来进行重建,但是无法确认具体要修改 哪些地方? 当时我参考了一下dbsnake以前的一篇详见9i & 10g datafile header文章, 大致判断如下几个地方需要修改: |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 |
kcvfhrls --resetlogs change scn值 kcvfhprs --Pre resetlogs change 值(即上次resetlog的scn) kcvfhprc --prev reset logs count kcvfhrlc --resetlogs time FHRBA_SEQ --Redo log sequence number kccfhcsq --controlfile sequence number 当然,这些信息都能从X$KCVFH中查到。 后面经过和熊哥沟通,确认只需要修改如下4个地方即可: FHPRS offset 420 RHPRC offset 416 FHRLS offset 116 FHRLC offset 112 |
然后必须使用resetlogs方式创建,否则会遇到如下类似错误:
1 2 3 4 5 6 |
CREATE CONTROLFILE REUSE DATABASE "E200PRO" NORESETLOGS NOARCHIVELOG * ERROR at line 1: ORA-01503: CREATE CONTROLFILE failed ORA-01189: file is from a different RESETLOGS than previous files ORA-01517: log member: '/u01/oradata/e200pro/redo01.log' |
重建完controlfile以后,直接recover database一把,提示成功,然后直接open抛出如下错误:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 |
Sat May 12 13:45:23 CST 2012 SMON: enabling cache recovery Sat May 12 13:45:23 CST 2012 ORA-01555 caused by SQL statement below (SQL ID: 4krwuz0ctqxdt, SCN: 0x0938.d2a71797): Sat May 12 13:45:23 CST 2012 select ctime, mtime, stime from obj$ where obj# = :1 Sat May 12 13:45:23 CST 2012 Errors in file /oracle/product/10.2/admin/e200pro/udump/e200pro1_ora_2253.trc: ORA-00704: bootstrap process failure ORA-00704: bootstrap process failure ORA-00604: error occurred at recursive SQL level 1 ORA-01555: snapshot too old: rollback segment number 9 with name "_SYSSMU9$" too small Error 704 happened during db open, shutting down database USER: terminating instance due to error 704 Instance terminated by USER, pid = 2253 ORA-1092 signalled during: alter databas |
开始以为是这个回滚段有问题,使用如下隐含参数以后发现仍然不行:
_corrupted_rollback_segments=(_SYSSMU9$)
_offline_rollback_segments=(_SYSSMU9$)
1 2 3 4 5 6 7 |
通过bbed检查该回滚段状态,发现为3,也就是online,原来如此,这里我本来想 去修改该回滚段状态的,熊哥说修改的话建议直接修改为5,不过我这里没有修改。 最后熊哥建议调整scn,在mount下我手工操作,如下: alter session set events '10015 trace name adjust_scn level 10'; 发现不行,最后 alter session set events '10015 trace name adjust_scn level 9445'; 依然不行,无奈只能用*._minimum_giga_scn=9445加到pfile里面,然后启动。 直接正常alter database open成功。 |
1 2 3 4 5 6 7 8 9 10 11 12 |
不过后面接下来的是一系列的ora-00600错误: Sat May 12 15:17:52 CST 2012 Recovery of Online Redo Log: Thread 2 Group 6 Seq 2 Reading mem 0 Mem# 0: /u02/oradata/flash_recover_area/E200PRO/onlinelog/o1_mf_6_7tvy79tz_.log Block recovery completed at rba 2.65.16, scn 2361.1073847969 Sat May 12 15:17:52 CST 2012 Errors in file /oracle/product/10.2/admin/e200pro/udump/e200pro2_ora_11977.trc: ORA-00600: internal error code, arguments: [4194], [52], [23], [], [], [], [], [] Sat May 12 15:17:53 CST 2012 Errors in file /oracle/product/10.2/admin/e200pro/bdump/e200pro2_smon_11886.trc: ORA-00600: internal error code, arguments: [4194], [47], [38], [], [], [], [], [] Sat May 12 15:17:55 CST 2012 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 |
---该错误很容易处理,就不多说了 Sat May 12 15:47:01 CST 2012 Errors in file /oracle/product/10.2/admin/e200pro/bdump/e200pro1_j001_25397.trc: ORA-00604: error occurred at recursive SQL level 1 ORA-08102: index key not found, obj# 239, file 1, block 458954 (2) ORA-12012: error on auto execute of job 23 ORA-08102: index key not found, obj# 239, file 1, block 458954 (2) Sat May 12 15:56:36 CST 2012 Errors in file /oracle/product/10.2/admin/e200pro/bdump/e200pro1_j001_29476.trc: ORA-12012: error on auto execute of job 186 ORA-08102: index key not found, obj# 239, file 1, block 458954 (2) Sat May 12 15:56:37 CST 2012 Errors in file /oracle/product/10.2/admin/e200pro/bdump/e200pro1_j000_29474.trc: ORA-00600: internal error code, arguments: [qertbFetchByRowID], [], [], [], [], [], [], [] Sat May 12 15:56:38 CST 2012 Errors in file /oracle/product/10.2/admin/e200pro/bdump/e200pro1_j003_29482.trc: ORA-00604: error occurred at recursive SQL level 1 ORA-08102: index key not found, obj# 239, file 1, block 458954 (2) ORA-12012: error on auto execute of job 2 ORA-08102: index key not found, obj# 239, file 1, block 458954 (2) Sat May 12 15:56:40 CST 2012 ---这里根据obj定位到为对象i_job_next,发现为obj$上的索引,直接drop重建解决。 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 |
后面又是更为严重的13013错误,直接导致实例crash: Sat May 12 16:22:22 CST 2012 Errors in file /oracle/product/10.2/admin/e200pro/bdump/e200pro2_smon_11886.trc: ORA-00600: internal error code, arguments: [13013], [5001], [483], [4462439], [151], [4462439], [4], [] Sat May 12 16:22:23 CST 2012 Non-fatal internal error happenned while SMON was doing flushing of monitored table stats. SMON encountered 1 out of maximum 100 non-fatal internal errors. Sat May 12 16:22:23 CST 2012 Trace dumping is performing id=[cdmp_20120512162223] Sat May 12 16:22:24 CST 2012 Errors in file /oracle/product/10.2/admin/e200pro/bdump/e200pro2_smon_11886.trc: ORA-00600: internal error code, arguments: [13013], [5001], [483], [4462439], [151], [4462439], [4], [] Sat May 12 16:22:26 CST 2012 Non-fatal internal error happenned while SMON was doing flushing of monitored table stats. SMON encountered 2 out of maximum 100 non-fatal internal errors. Sat May 12 16:22:27 CST 2012 |
1 2 3 4 5 6 7 8 9 |
---483为data_object_id,可以查到对象,4462439为rdba地址,也可以进行转换 SQL> select dbms_utility.data_block_address_file(4462439) Rfile# ,dbms_utility.data_block_address_block(4462439) "Block#" from dual; RFILE# Block# ---------- ---------- 1 268135 dbv检查该block是正常,至少物理上是ok的。 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 |
[oracle@dbssvr-a datafile]$ dbv file=/u01/oradata/e200pro/system01.dbf blocksize=8192 start=268134 end=268136 DBVERIFY: Release 10.2.0.5.0 - Production on Sat May 12 16:46:14 2012 Copyright (c) 1982, 2007, Oracle. All rights reserved. DBVERIFY - Verification starting : FILE = /u01/oradata/e200pro/system01.dbf DBVERIFY - Verification complete Total Pages Examined : 3 Total Pages Processed (Data) : 3 Total Pages Failing (Data) : 0 Total Pages Processed (Index): 0 Total Pages Failing (Index): 0 Total Pages Processed (Other): 0 Total Pages Processed (Seg) : 0 Total Pages Failing (Seg) : 0 Total Pages Empty : 0 Total Pages Marked Corrupt : 0 Total Pages Influx : 0 Highest block SCN : 1074016503 (2361.1074016503) 经查该对象为索引i_mon_mods$_obj,直接drop index i_mon_mods$_obj; 然后运行如下语句重建即可: create unique index i_mon_mods$_obj on mon_mods$(obj#) storage (maxextents unlimited) / 处理完ora-600 13013错误以后,数据库不再crash。 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 |
后面还有一些如下错误: Sat May 12 16:42:36 CST 2012 Errors in file /oracle/product/10.2/admin/e200pro/udump/e200pro2_ora_13411.trc: ORA-00600: internal error code, arguments: [kdsgrp1], [], [], [], [], [], [], [] Sat May 12 16:42:39 CST 2012 Trace dumping is performing id=[cdmp_20120512164239] Sat May 12 16:43:27 CST 2012 Errors in file /oracle/product/10.2/admin/e200pro/udump/e200pro2_ora_14023.trc: ORA-00600: internal error code, arguments: [kdsgrp1], [], [], [], [], [], [], [] Sat May 12 16:43:28 CST 2012 Trace dumping is performing id=[cdmp_20120512164328] Sat May 12 19:26:53 CST 2012 Errors in file /oracle/product/10.2/admin/e200pro/udump/e200pro2_ora_2148.trc: ORA-00600: internal error code, arguments: [6002], [6], [6], [4], [0], [], [], [] Sat May 12 19:26:54 CST 2012 Errors in file /oracle/product/10.2/admin/e200pro/udump/e200pro2_ora_2148.trc: ORA-00600: internal error code, arguments: [6002], [6], [6], [4], [0], [], [], [] Sat May 12 19:26:56 CST 2012 Errors in file /oracle/product/10.2/admin/e200pro/udump/e200pro2_ora_2148.trc: ORA-00600: internal error code, arguments: [6002], [6], [6], [4], [0], [], [], [] Sat May 12 19:26:58 CST 2012 Sun May 13 08:00:11 CST 2012 Errors in file /oracle/product/10.2/admin/e200pro/bdump/e200pro2_j000_31295.trc: ORA-00600: internal error code, arguments: [qertbFetchByRowID], [], [], [], [], [], [], [] Sun May 13 08:00:15 CST 2012 Errors in file /oracle/product/10.2/admin/e200pro/bdump/e200pro2_j000_31295.trc: ORA-00600: internal error code, arguments: [qertbFetchByRowID], [], [], [], [], [], [], [] Sun May 13 08:00:17 CST 2012 Errors in file /oracle/product/10.2/admin/e200pro/bdump/e200pro2_j000_31295.trc: ORA-00600: internal error code, arguments: [qertbFetchByRowID], [], [], [], [], [], [], [] Sun May 13 08:00:21 CST 2012 |
如上这几个错误都跟index有关系,也就是存在逻辑错误,可以通过如下方式进行检查:
analyze table owner.table_name validate structure cascade online;
补充一些:对于分区表,要执行validate操作,必须先执行如下sql:
@ ?/rdbms/admin/utlvaite.sql
当然,处理方式也就很简单了,把有问题的index drop重建。
后面还陆续报出如下一系列错误:
1 2 3 4 5 6 7 8 9 10 11 |
Sun May 13 18:09:24 CST 2012 Errors in file /oracle/product/10.2/admin/e200pro/udump/e200pro2_ora_13838.trc: ORA-00600: internal error code, arguments: [ktspgsb-1], [], [], [], [], [], [], [] Sun May 13 18:09:50 CST 2012 Trace dumping is performing id=[cdmp_20120513180950] Sun May 13 18:10:04 CST 2012 Errors in file /oracle/product/10.2/admin/e200pro/udump/e200pro2_ora_13838.trc: ORA-00600: internal error code, arguments: [ktspgsb-1], [], [], [], [], [], [], [] Sun May 13 18:21:20 CST 2012 ---该错误明显是涉及的表损坏了,重建表即可。 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 |
Sun May 13 18:42:58 CST 2012 Hex dump of (file 25, block 9819) in trace file /oracle/product/10.2/admin/e200pro/udump/e200pro2_ora_1460.trc Corrupt block relative dba: 0x0640265b (file 25, block 9819) Fractured block found during buffer read Data in bad block: type: 6 format: 2 rdba: 0x0640265b last change scn: 0x0938.f6dc4f53 seq: 0x1 flg: 0x04 spare1: 0x0 spare2: 0x0 spare3: 0x0 consistency value in tail: 0x3f300601 check value in block header: 0xdf55 computed block checksum: 0xe34 Reread of rdba: 0x0640265b (file 25, block 9819) found same corrupted data Sun May 13 18:42:58 CST 2012 Corrupt Block Found TSN = 10, TSNAME = TBLSPA_ARCH_IND RFN = 25, BLK = 9819, RDBA = 104867419 OBJN = 484917, OBJD = 543662, OBJECT = PK_C_IT_RUN, SUBOBJECT = SEGMENT OWNER = CSMDBOWNER, SEGMENT TYPE = Index Segment Sun May 13 18:45:31 CST 2012 Errors in file /oracle/product/10.2/admin/e200pro/udump/e200pro2_ora_31407.trc: ORA-00600: internal error code, arguments: [kcbnew_3], [0], [4], [648035], [], [], [], [] Sun May 13 18:45:58 CST 2012 ---该错误比较明显,是index block损坏,是Fractured block,drop重建即可。 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 |
Mon May 14 10:46:19 CST 2012 Recovery of Online Redo Log: Thread 2 Group 5 Seq 117 Reading mem 0 Mem# 0: /u02/oradata/flash_recover_area/E200PRO/onlinelog/o1_mf_5_7tvy78pk_.log Block recovery stopped at EOT rba 117.93764.16 Block recovery completed at rba 117.93764.16, scn 2361.1083355810 Mon May 14 10:46:19 CST 2012 Errors in file /oracle/product/10.2/admin/e200pro/bdump/e200pro2_smon_11792.trc: ORA-00607: Internal error occurred while making a change to a data block ORA-00600: internal error code, arguments: [kddummy_blkchk], [23], [3], [18018], [], [], [], [] Mon May 14 10:46:19 CST 2012 Trace dumping is performing id=[cdmp_20120514104619] Mon May 14 10:46:22 CST 2012 Errors in file /oracle/product/10.2/admin/e200pro/bdump/e200pro2_smon_11792.trc: ORA-00600: internal error code, arguments: [kddummy_blkchk], [23], [3], [18018], [], [], [], [] Mon May 14 10:46:23 CST 2012 Doing block recovery for file 23 block 3 Resuming block recovery (PMON) for file 23 block 3 Block recovery from logseq 117, block 93650 to scn 10141501141666 Mon May 14 10:46:23 CST 2012 ---关于[kddummy_blkchk]错误,我在mos上搜了一下,相关的bug比较多,总的来说 |
1 2 3 4 5 6 7 8 9 |
这里还是涉及的对象有问题,需要把相关对象重建应该就能解决。 SELECT segment_name FROM dba_extents WHERE file_id = 23 AND 3 BETWEEN block_id AND block_id + blocks - 1; 如果说实例很快就crash了,可以通过event 10513来阻止smon去进行实例recovery,如下: alter system set event='10513 trace name context forever,level 2' scope=spfile; 然后在对表进行处理。 |
3 Responses to “记一次非归档RAC恢复(大量的ORA-600 错误)”
[…] 详见原文博客链接http://www.killdb.com/2012/05/14/%e8%ae%b0%e4%b8%80%e6%ac%a1%e9%9d%9e%e5%bd%92%e6%a1%a3rac%e6%81%a2%… 作者:lovewifelovelife 发表于2012-5-14 14:38:00 原文链接 阅读:21 评论:0 查看评论 […]
强。如果我遇到,那么我就歇菜了。。。。
[…] 上午群里网友讨论了一下关于controlfile重建的问题,曾经的一次恢复(记一次非归档RAC恢复(大量的ORA-600 错误)) 的印象中发现对于resetlogs和noresetlogs是有点不同的,于是这里就做了一个简单的测试: ——归档模式 SQL> conn /AS sysdba Connected. SQL> startup mount ORACLE instance started. Total System Global Area 167772160 bytes Fixed SIZE 1272600 bytes Variable SIZE 109053160 bytes DATABASE Buffers 54525952 bytes Redo Buffers 2920448 bytes DATABASE mounted. SQL> archive log list; DATABASE log mode Archive Mode Automatic archival Enabled Archive destination /home/ora10g/archivelog Oldest online log SEQUENCE 358 NEXT log SEQUENCE TO archive 360 CURRENT log SEQUENCE 360 SQL> SQL> SELECT checkpoint_change# FROM v$database; CHECKPOINT_CHANGE# —————— 4043243 SQL> SELECT file# , CHECKPOINT_CHANGE# FROM v$datafile_header ORDER BY 1; FILE# CHECKPOINT_CHANGE# ———- —————— 1 4043243 2 4043243 3 4043243 4 4043243 5 4043243 6 4043243 7 4043243 8 4043243 9 4043243 9 ROWS selected. SQL> SET LINES 200 SQL> SELECT hxfil,hxsts,fhtyp,fhscn,fhprc,fhprs,fhrls,fhrlc,fhcsq,fhcpc,fhccc FROM x$kcvfh ORDER BY 1; HXFIL HXSTS FHTYP FHSCN FHPRC FHPRS FHRLS FHRLC FHCSQ FHCPC FHCCC ———- —————- ———- —————- ——————– —————- —————- ——————– ———- ———- ———- 1 4043243 3 4043243 10/20/2011 07:40:58 666180 848661 11/03/2011 02:05:04 5037 754 753 2 4043243 3 4043243 10/20/2011 07:40:58 666180 848661 11/03/2011 02:05:04 5037 718 717 3 4043243 3 4043243 10/20/2011 07:40:58 666180 848661 11/03/2011 02:05:04 5037 754 753 4 4043243 3 4043243 10/20/2011 07:40:58 666180 848661 11/03/2011 02:05:04 5037 753 752 5 4043243 3 4043243 10/20/2011 07:40:58 666180 848661 11/03/2011 02:05:04 5037 523 522 6 4043243 3 4043243 10/20/2011 07:40:58 666180 848661 11/03/2011 02:05:04 5037 525 524 7 4043243 3 4043243 10/20/2011 07:40:58 666180 848661 11/03/2011 02:05:04 5037 405 404 8 4043243 3 4043243 10/20/2011 07:40:58 666180 848661 11/03/2011 02:05:04 5037 206 205 9 4043243 3 4043243 10/20/2011 07:40:58 666180 848661 11/03/2011 02:05:04 5037 110 109 9 ROWS selected. […]
Leave a Reply
You must be logged in to post a comment.