One recover case!
本站文章除注明转载外,均为本站原创: 转载自love wife love life —Roger的Oracle/MySQL/PostgreSQL数据恢复博客
本文链接地址: One recover case!
今天ML的群中女神和travel在纠结一个恢复的问题,11.2.0.3版本,非归档,大概是rm掉current的log,然后重建
controlfille后恢复导致一系列问题,并最终出现ora-600 2662错误,虽然这个错误很常见,但是你发现推进scn
也是无法open,感觉有点怪,远程了女神的电脑,操作不便,最后将文件压缩传过来,我在自己的vmware进行了恢复。
由于环境的差异,所以解压后我先进行rename file操作,如下:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 |
SQL> select name from v$datafile; NAME ------------------------------------------------ /u01/app/oracle/oradata/travel/system01.dbf ......... /u01/app/oracle/oradata/travel/users01.dbf SQL> select member from v$logfile; MEMBER ------------------------------------------------- /u01/app/oracle/oradata/travel/redo03.log /u01/app/oracle/oradata/travel/redo02.log /u01/app/oracle/oradata/travel/redo01.log SQL> alter database rename file '/u01/app/oracle/oradata/travel/system01.dbf' to '/home/oracle/travel/travel/system01.dbf'; Database altered. SQL> alter database rename file '/u01/app/oracle/oradata/travel/sysaux01.dbf' to '/home/oracle/travel/travel/sysaux01.dbf'; Database altered. SQL> alter database rename file '/u01/app/oracle/oradata/travel/undotbs01.dbf' to '/home/oracle/travel/travel/undotbs01.dbf'; Database altered. SQL> alter database rename file '/u01/app/oracle/oradata/travel/users01.dbf' to '/home/oracle/travel/travel/users01.dbf'; Database altered. SQL> alter database rename file '/u01/app/oracle/oradata/travel/redo01.log' to '/home/oracle/travel/travel/redo01.log'; Database altered. SQL> alter database rename file '/u01/app/oracle/oradata/travel/redo02.log' to '/home/oracle/travel/travel/redo02.log'; Database altered. SQL> alter database rename file '/u01/app/oracle/oradata/travel/redo03.log' to '/home/oracle/travel/travel/redo03.log'; Database altered. SQL> select name from v$datafile; NAME ------------------------------------------------- /home/oracle/travel/travel/system01.dbf /home/oracle/travel/travel/sysaux01.dbf /home/oracle/travel/travel/undotbs01.dbf /home/oracle/travel/travel/users01.dbf SQL> select member from v$Logfile; MEMBER ------------------------------------------------- /home/oracle/travel/travel/redo03.log /home/oracle/travel/travel/redo02.log /home/oracle/travel/travel/redo01.log |
此时,进行recover,会报错ORA-16433,如下:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 |
SQL> recover database; ORA-00283: recovery session canceled due to errors ORA-16433: The database must be opened in read/write mode. SQL> recover database using backup controlfile until cancel; ORA-00283: recovery session canceled due to errors ORA-16433: The database must be opened in read/write mode. SQL> SQL> alter database open; alter database open * ERROR at line 1: ORA-01113: file 1 needs media recovery ORA-01110: data file 1: '/home/oracle/travel/travel/system01.dbf' |
关于该错误,通过oerr命令可以大概了解一下其含义,如下:
1 2 3 4 5 6 7 8 9 10 11 12 |
[oracle@11gR2_primary ~]$ oerr ora 16433 16433, 00000, "The database must be opened in read/write mode." // *Cause: An attempt was made to open the database in read-only mode after an // operation that requires that the database be opened in read/write // mode. // *Action: Open the database in read/write mode. The database can then be // opened in read-only mode. 通过这个错误,我们可以得出一个信息,数据库可以以read only模式打开,换句话讲,你查询如下几个值都是一样的: select checkpoint_change# from v$database; --来自controlfile select checkpoint_change# from v$datafile; --来自controlfile select checkpoint_change# from v$datafile_orader; --来自datafile header |
其实,通常来讲,只要上述几个值一样,那么我们的db应该都是可以直接open打开的,然而,这里却不行,read only都不行,我试过。
然后下面就开始我的恢复工作。
第一次尝试恢复:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 |
---重建controlfile SQL> startup nomount pfile=/home/oracle/travel/travel/a.ora ORACLE instance started. Total System Global Area 626327552 bytes Fixed Size 2230952 bytes Variable Size 184550744 bytes Database Buffers 436207616 bytes Redo Buffers 3338240 bytes SQL> CREATE CONTROLFILE REUSE DATABASE "TRAVEL" RESETLOGS NOARCHIVELOG 2 MAXLOGFILES 16 3 MAXLOGMEMBERS 3 4 MAXDATAFILES 100 5 MAXINSTANCES 8 6 MAXLOGHISTORY 292 7 LOGFILE 8 GROUP 1 '/home/oracle/travel/travel/redo01.log' SIZE 50M BLOCKSIZE 512, 9 GROUP 2 '/home/oracle/travel/travel/redo02.log' SIZE 50M BLOCKSIZE 512, 10 GROUP 3 '/home/oracle/travel/travel/redo03.log' SIZE 50M BLOCKSIZE 512 11 -- STANDBY LOGFILE 12 DATAFILE 13 '/home/oracle/travel/travel/system01.dbf', 14 '/home/oracle/travel/travel/sysaux01.dbf', 15 '/home/oracle/travel/travel/undotbs01.dbf', 16 '/home/oracle/travel/travel/users01.dbf' 17 CHARACTER SET AL32UTF8 18 ; Control file created. SQL> 开始开始进行recover: SQL> alter database open resetlogs; alter database open resetlogs * ERROR at line 1: ORA-01194: file 1 needs more recovery to be consistent ORA-01110: data file 1: '/home/oracle/travel/travel/system01.dbf' SQL> recover database until cancel; ORA-00283: recovery session canceled due to errors ORA-01610: recovery using the BACKUP CONTROLFILE option must be done SQL> recover database using backup controlfile until cancel; ORA-00279: change 244977 generated at 01/19/2013 01:56:54 needed for thread 1 ORA-00289: suggestion : /oracle/product/11.2.0/db_1/dbs/arch1_1_805082211.dbf ORA-00280: change 244977 for thread 1 is in sequence #1 Specify log: {=suggested | filename | AUTO | CANCEL} cancel ORA-01547: warning: RECOVER succeeded but OPEN RESETLOGS would get error below ORA-01194: file 1 needs more recovery to be consistent ORA-01110: data file 1: '/home/oracle/travel/travel/system01.dbf' ORA-01112: media recovery not started |
因为本身是非归档,所以这个不完全恢复的步骤无非是为了后面可以进行open resetlog。然后停库,加入隐含参数进行open:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 |
SQL> shutdown immediate ORA-01109: database not open Database dismounted. ORACLE instance shut down. SQL> startup nomount pfile=/home/oracle/travel/travel/b.ora ORACLE instance started. Total System Global Area 626327552 bytes Fixed Size 2230952 bytes Variable Size 184550744 bytes Database Buffers 436207616 bytes Redo Buffers 3338240 bytes SQL> alter database mount 2 ; Database altered. SQL> alter database open resetlogs; alter database open resetlogs * ERROR at line 1: ORA-01092: ORACLE instance terminated. Disconnection forced ORA-00600: internal error code, arguments: [2662], [0], [244985], [0], [244998], [4194432], [], [], [], [], [], [] Process ID: 3641 Session ID: 1 Serial number: 5 |
此时alert log对应的错误如下:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 |
Fri Dec 14 21:58:39 2012 SMON: enabling cache recovery Errors in file /oracle/diag/diag/rdbms/travel/travel/trace/travel_ora_3641.trc (incident=4937): ORA-00600: internal error code, arguments: [2662], [0], [244985], [0], [244998], [4194432], [], [], [], [], [], [] Incident details in: /oracle/diag/diag/rdbms/travel/travel/incident/incdir_4937/travel_ora_3641_i4937.trc Fri Dec 14 21:58:43 2012 Dumping diagnostic data in directory=[cdmp_20121214215843], requested by (instance=1, osid=3641), summary=[incident=4937]. Use ADRCI or Support Workbench to package the incident. See Note 411.1 at My Oracle Support for error and packaging details. Errors in file /oracle/diag/diag/rdbms/travel/travel/trace/travel_ora_3641.trc: ORA-00600: internal error code, arguments: [2662], [0], [244985], [0], [244998], [4194432], [], [], [], [], [], [] Errors in file /oracle/diag/diag/rdbms/travel/travel/trace/travel_ora_3641.trc: ORA-00600: internal error code, arguments: [2662], [0], [244985], [0], [244998], [4194432], [], [], [], [], [], [] Error 600 happened during db open, shutting down database USER (ospid: 3641): terminating the instance due to error 600 Instance terminated by USER, pid = 3641 ORA-1092 signalled during: alter database open resetlogs... opiodr aborting process unknown ospid (3641) as a result of ORA-1092 |
甚至手工执行如下操作都无法open数据库,如下:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 |
SQL> alter session set events '10015 trace name ADJUST_SCN level 14'; Session altered. SQL> alter database open; alter database open * ERROR at line 1: ORA-01113: file 1 needs media recovery ORA-01110: data file 1: '/home/oracle/travel/travel/system01.dbf' 最后dunmp controlfile和current redo logfile,发现low cache rba信息居然是最大值: SQL> oradebug setmypid Statement processed. SQL> alter system set events 'immediate trace name controlf level 4' ; System altered. SQL> alter system dump logfile '/home/oracle/travel/travel/redo01.log'; SQL> oradebug tracefile_name /oracle/diag/diag/rdbms/travel/travel/trace/travel_ora_4229.trc SQL> |
++++++++ controlfile dump 片段
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 |
*************************************************************************** CHECKPOINT PROGRESS RECORDS *************************************************************************** (size = 8180, compat size = 8180, section max = 11, section in-use = 0, last-recid= 0, old-recno = 0, last-recno = 0) (extent = 1, blkno = 2, numrecs = 11) THREAD #1 - status:0x2 flags:0x0 dirty:0 low cache rba:(0xffffffff.ffffffff.ffff) on disk rba:(0x1.3.0) ---low cache rba为无穷大 on disk scn: 0x0000.0003bcf7 12/14/2012 21:58:39 resetlogs scn: 0x0000.0003bcf2 12/14/2012 21:58:36 heartbeat: 802069494 mount id: 2869233386 THREAD #2 - status:0x0 flags:0x0 dirty:0 low cache rba:(0x0.0.0) on disk rba:(0x0.0.0) on disk scn: 0x0000.00000000 01/01/1988 00:00:00 resetlogs scn: 0x0000.00000000 01/01/1988 00:00:00 heartbeat: 0 mount id: 0 THREAD #3 - status:0x0 flags:0x0 dirty:0 low cache rba:(0x0.0.0) on disk rba:(0x0.0.0) on disk scn: 0x0000.00000000 01/01/1988 00:00:00 resetlogs scn: 0x0000.00000000 01/01/1988 00:00:00 heartbeat: 0 mount id: 0 +++++++redo01.log dump 片段 DUMP OF REDO FROM FILE '/home/oracle/travel/travel/redo01.log' Opcodes *.* RBAs: 0x000000.00000000.0000 thru 0xffffffff.ffffffff.ffff SCNs: scn: 0x0000.00000000 thru scn: 0xffff.ffffffff Times: creation thru eternity FILE HEADER: Compatibility Vsn = 186646528=0xb200000 Db ID=2872261344=0xab333ae0, Db Name='TRAVEL' Activation ID=2872292516=0xab33b4a4 Control Seq=233=0xe9, File size=102400=0x19000 File Number=1, Blksiz=512, File Type=2 LOG descrip:"Thread 0001, Seq# 0000000001, SCN 0x00000003bcee-0xffffffffffff" thread: 1 nab: 0xffffffff seq: 0x00000001 hws: 0x3 eot: 1 dis: 0 resetlogs count: 0x2ffc9463 scn: 0x0000.0003bcee (244974) prev resetlogs count: 0x2ffc7a41 scn: 0x0000.0003bcea (244970) Low scn: 0x0000.0003bcee (244974) 01/19/2013 01:56:51 Next scn: 0xffff.ffffffff 01/01/1988 00:00:00 Enabled scn: 0x0000.0003bcee (244974) 01/19/2013 01:56:51 Thread closed scn: 0x0000.0003bcee (244974) 01/19/2013 01:56:51 Disk cksum: 0x3467 Calc cksum: 0x3467 Terminal recovery stop scn: 0x0000.00000000 Terminal recovery 01/01/1988 00:00:00 Most recent redo scn: 0x0000.00000000 Largest LWN: 0 blocks End-of-redo stream : No Unprotected mode Miscellaneous flags: 0x800000 Thread internal enable indicator: thr: 0, seq: 0 scn: 0x0000.00000000 Zero blocks: 0 Format ID is 2 redo log key is 853c461da2eec7ed4b45ce75b8c27d7 redo log key flag is 5 Enabled redo threads: 1 REDO RECORD - Thread:1 RBA: 0x000001.00000002.0010 LEN: 0x0070 VLD: 0x05 SCN: 0x0000.0003bcf2 SUBSCN: 1 01/19/2013 01:56:54 (LWN RBA: 0x000001.00000002.0010 LEN: 0001 NST: 0001 SCN: 0x0000.0003bcee) , CHANGE #1 MEDIA RECOVERY MARKER SCN:0x0000.00000000 SEQ:0 OP:17.3 ENC:0 END OF REDO DUMP |
可以看到实际上current redo的LWN RBA是对的,为1.2.10,是小于on disk rba (0x1.3.0)的。那么这里为什么会出现这个情况?
猜测可能是写紊乱了。到这里也就比较明白了,仍然是conrolfile有问题。
既然如此,那么我直接将controlfile文件rm掉,然后再次重建。如下:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 |
SQL> startup nomount pfile='/home/oracle/travel/travel/a.ora ORACLE instance started. Total System Global Area 626327552 bytes Fixed Size 2230952 bytes Variable Size 184550744 bytes Database Buffers 436207616 bytes Redo Buffers 3338240 bytes SQL> CREATE CONTROLFILE REUSE DATABASE "TRAVEL" RESETLOGS NOARCHIVELOG 2 MAXLOGFILES 16 3 MAXLOGMEMBERS 3 4 MAXDATAFILES 100 5 MAXINSTANCES 8 6 MAXLOGHISTORY 292 7 LOGFILE 8 GROUP 1 '/home/oracle/travel/travel/redo01.log' SIZE 50M BLOCKSIZE 512, 9 GROUP 2 '/home/oracle/travel/travel/redo02.log' SIZE 50M BLOCKSIZE 512, 10 GROUP 3 '/home/oracle/travel/travel/redo03.log' SIZE 50M BLOCKSIZE 512 11 -- STANDBY LOGFILE 12 DATAFILE 13 '/home/oracle/travel/travel/system01.dbf', 14 '/home/oracle/travel/travel/sysaux01.dbf', 15 '/home/oracle/travel/travel/undotbs01.dbf', 16 '/home/oracle/travel/travel/users01.dbf' 17 CHARACTER SET AL32UTF8 18 ; Control file created. SQL> recover database until cancel using backup controlfile; ORA-00279: change 244985 generated at 12/14/2012 23:05:06 needed for thread 1 ORA-00289: suggestion : /oracle/product/11.2.0/db_1/dbs/arch1_1_802047904.dbf ORA-00280: change 244985 for thread 1 is in sequence #1 Specify log: {=suggested | filename | AUTO | CANCEL} AUTO ORA-00308: cannot open archived log '/oracle/product/11.2.0/db_1/dbs/arch1_1_802047904.dbf' ORA-27037: unable to obtain file status Linux-x86_64 Error: 2: No such file or directory Additional information: 3 ORA-00308: cannot open archived log '/oracle/product/11.2.0/db_1/dbs/arch1_1_802047904.dbf' ORA-27037: unable to obtain file status Linux-x86_64 Error: 2: No such file or directory Additional information: 3 ORA-01547: warning: RECOVER succeeded but OPEN RESETLOGS would get error below ORA-01194: file 1 needs more recovery to be consistent ORA-01110: data file 1: '/home/oracle/travel/travel/system01.dbf' SQL> SQL> SQL> select file#,CHECKPOINT_CHANGE#,LAST_CHANGE# from v$datafile order by 1; FILE# CHECKPOINT_CHANGE# LAST_CHANGE# ---------- ------------------ ------------ 1 244985 2 244985 3 244985 4 244985 SQL> select file#,CHECKPOINT_CHANGE# from v$datafile_header order by 1; FILE# CHECKPOINT_CHANGE# ---------- ------------------ 1 244985 2 244985 3 244985 4 244985 SQL> select CHECKPOINT_CHANGE# from v$database; CHECKPOINT_CHANGE# ------------------ 0 |
此时停库,然后在pfile中加入隐含参数:
*._allow_resetlogs_corruption=TRUE
*._allow_error_simulation=TRUE
接着再次进行mount,并进行scn 推进:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 |
SQL> alter database open resetlogs; alter database open resetlogs * ERROR at line 1: ORA-01194: file 1 needs more recovery to be consistent ORA-01110: data file 1: '/home/oracle/travel/travel/system01.dbf' SQL> SQL> shutdown immediate ORA-01109: database not open Database dismounted. ORACLE instance shut down. SQL> startup nomount pfile='/home/oracle/travel/travel/b.ora ORACLE instance started. Total System Global Area 626327552 bytes Fixed Size 2230952 bytes Variable Size 184550744 bytes Database Buffers 436207616 bytes Redo Buffers 3338240 bytes SQL> alter database mount 2 ; Database altered. SQL> alter session set events '10015 trace name ADJUST_SCN level 10'; Session altered. SQL> alter database open; alter database open * ERROR at line 1: ORA-01589: must use RESETLOGS or NORESETLOGS option for database open SQL> alter database open noresetlogs; alter database open noresetlogs * ERROR at line 1: ORA-01588: must use RESETLOGS option for database open SQL> alter database open resetlogs; Database altered. SQL> show parameter name NAME TYPE VALUE ------------------------------------ ----------- ------------------------------ db_file_name_convert string db_name string travel db_unique_name string travel global_names boolean FALSE instance_name string travel lock_name_space string log_file_name_convert string processor_group_name string service_names string travel SQL> select open_mode from v$database; OPEN_MODE -------------------- READ WRITE SQL> SQL> SQL> alter system switch logfile; System altered. SQL> |
这里说明一下是我的alert log里面还出现了数据字典不一致的问题,因为毕竟是强制open的,如下:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 |
Sun Jan 20 00:25:34 2013 Errors in file /oracle/diag/diag/rdbms/travel/travel/trace/travel_m000_4681.trc: ORA-25153: Temporary Tablespace is Empty Sun Jan 20 00:25:38 2013 Errors in file /oracle/diag/diag/rdbms/travel/travel/trace/travel_j004_4692.trc (incident=21817): ORA-00600: internal error code, arguments: [kdsgrp1], [], [], [], [], [], [], [], [], [], [], [] Incident details in: /oracle/diag/diag/rdbms/travel/travel/incident/incdir_21817/travel_j004_4692_i21817.trc Sun Jan 20 00:25:43 2013 Dumping diagnostic data in directory=[cdmp_20130120002543], requested by (instance=1, osid=4692 (J004)), summary=[incident=21817]. Use ADRCI or Support Workbench to package the incident. See Note 411.1 at My Oracle Support for error and packaging details. Sun Jan 20 00:25:47 2013 这个ora-600错误很常见了,通过是index信息不一致。 其trace内容如下: * kdsgrp1-1: ************************************************* row 0x0040589e.4b continuation at 0x0040589e.4b file# 1 block# 22686 slot 75 not found KDSTABN_GET: 0 ..... ntab: 1 curSlot: 75 ..... nrows: 75 kdsgrp - dump CR block dba=0x0040589e Block header dump: 0x0040589e Object id on Block? Y seg/obj: 0x12 csc: 0x00.3bca9 itc: 1 flg: O typ: 1 - DATA fsl: 0 fnx: 0x40589f ver: 0x01 Itl Xid Uba Flag Lck Scn/Fsc 0x01 0x0001.01a.000000a1 0x00c05e97.001d.0d --U- 1 fsc 0x0053.0003bcab bdba: 0x0040589e data_block_dump,data header at 0x7a26a044 =============== tsiz: 0x1fb8 hsiz: 0xa8 pbl: 0x7a26a044 76543210 flag=-------- ntab=1 nrow=75 frre=-1 fsbo=0xa8 ........... |
seg/obj: 0x12 这是obj$对象,属于bootstrap$核心对象,涉及到这类的对象处理相对麻烦,针对这类情况,数据库open后都建议
把数据导出,然后重建库,我这里就不在继续描述这个ora-600错误了,我博客也有类似的例子。
最后来个小节:
1. oracle通过系统checkpoint scn,datafile checkpoint scn,start scn三者之间的比较来判断数据文件是否需要进行介质恢复.
2. 在redo 线程打开的情况下,即数据库open的情况下,stop scn会被设置为无穷大,当正常关闭时,stop scn等于datafile scn.
这里需要注意的是,stop scn是存放在controlfile中的,网上部分资料说是存在datafile header中,这个说法是错误的。
3. oracle在open之前是先判断是否进行介质恢复,然后再是判断是否进行instance recovery。
4. 关于4种scn的关系如下:
system checkpoint scn — 存放在controlfile中
datafile checkpoint scn –存放在controlfile中
start scn —存放在datafile header中
stop scn —存放在controlfile中
system scn,datafile checkpoint scn,start scn,这3种scn用于判断数据文件是否需要进行介质恢复。这3个相等这不需要介质恢复。
如何这4个都相等,那么就不需要进行实例恢复。stop scn是用于判断是否进行实例恢复的。
5. 如果stop scn比其他的几个scn要大,那么就需要进行instance recover,需要进行扫描redo,实例恢复的起点是low cache rba,终点
是redo log的最末端。
5 Responses to “One recover case!”
写的真好,赞一个
[…] 详见原文博客链接地址:One recover case! 作者:lovewifelovelife 发表于2013-1-20 11:46:17 原文链接 阅读:44 评论:0 查看评论 […]
牛逼
按照大神的方法,整好了。。真心的好用。。 写的很详细。。赞一个。。
厉害啊。。。。
Leave a Reply
You must be logged in to post a comment.