Deep in ora-00600 [4193]
本站文章除注明转载外,均为本站原创: 转载自love wife love life —Roger的Oracle/MySQL/PostgreSQL数据恢复博客
本文链接地址: Deep in ora-00600 [4193]
这是一个网友的问题,10201的windows环境,非归档,无备份,数据库open的时候报常见的ORA-00600 4193错误。
这个错误很常见,我们也分析过多次,这里再次来看下网友这里的情况。其中alert log如下:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 |
Fri Nov 07 13:58:18 2014 MTTR advisory is disabled because FAST_START_MTTR_TARGET is not set Fri Nov 07 13:58:18 2014 SMON: enabling cache recovery Fri Nov 07 13:58:19 2014 Errors in file e:\oracle\product\10.2.0\admin\orcl\udump\orcl_ora_7420.trc: ORA-00600: 内部错误代码, 参数: [4193], [65], [71], [], [], [], [], [] Fri Nov 07 13:58:21 2014 Doing block recovery for file 1 block 415 Block recovery from logseq 6910, block 3 to scn 262520099 Fri Nov 07 13:58:21 2014 Recovery of Online Redo Log: Thread 1 Group 3 Seq 6910 Reading mem 0 Mem# 0 errs 0: E:\ORACLE\PRODUCT\10.2.0\ORADATA\ORCL\REDO03.LOG Block recovery stopped at EOT rba 6910.5.16 Block recovery completed at rba 6910.5.16, scn 0.262520098 Doing block recovery for file 1 block 9 Block recovery from logseq 6910, block 3 to scn 262520097 Fri Nov 07 13:58:21 2014 Recovery of Online Redo Log: Thread 1 Group 3 Seq 6910 Reading mem 0 Mem# 0 errs 0: E:\ORACLE\PRODUCT\10.2.0\ORADATA\ORCL\REDO03.LOG Block recovery completed at rba 6910.5.16, scn 0.262520098 Fri Nov 07 13:58:21 2014 Errors in file e:\oracle\product\10.2.0\admin\orcl\udump\orcl_ora_7420.trc: ORA-00604: 递归 SQL 级别 1 出现错误 ORA-00607: 当更改数据块时出现内部错误 ORA-00600: 内部错误代码, 参数: [4193], [65], [71], [], [], [], [], [] Error 604 happened during db open, shutting down database USER: terminating instance due to error 604 |
对于ORA-00600 4193错误,Oracle docs是这样解释的:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 |
ERROR: ORA-600 [4193] [a] [b] VERSIONS: versions 6.0 to 10.1 DESCRIPTION: A mismatch has been detected between Redo records and Rollback (Undo) records. We are validating the Undo block sequence number in the undo block against the Redo block sequence number relating to the change being applied. This error is reported when this validation fails. |
通过查看网友传的alert log信息,发现这哥们进行了大量的操作,几乎把所有的恢复动作都试了一遍。如下:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 |
System parameters with non-default values: processes = 300 sessions = 335 event = 10513 trace name context forever,level 2 : 10512 trace name context forever,level 1: 10511 trace name context forever,level 2: 10510 trace name context forever,level 1 sga_max_size = 1619001344 __shared_pool_size = 150994944 __large_pool_size = 8388608 __java_pool_size = 8388608 __streams_pool_size = 0 sga_target = 1619001344 control_files = E:\ORACLE\PRODUCT\10.2.0\ORADATA\ORCL\CONTROL01.CTL, E:\ORACLE\PRODUCT\10.2.0\ORADATA\ORCL\CONTROL02.CTL, E:\ORACLE\PRODUCT\10.2.0\ORADATA\ORCL\CONTROL03.CTL db_block_size = 8192 __db_cache_size = 1442840576 compatible = 10.2.0.1.0 db_file_multiblock_read_count= 16 db_recovery_file_dest = E:\oracle\product\10.2.0/flash_recovery_area db_recovery_file_dest_size= 2147483648 _allow_resetlogs_corruption= TRUE undo_management = MANUAL undo_tablespace = UNDOTBS1 _smu_debug_mode = 4 remote_login_passwordfile= EXCLUSIVE db_domain = dispatchers = (PROTOCOL=TCP) (SERVICE=orclXDB) job_queue_processes = 10 audit_file_dest = E:\ORACLE\PRODUCT\10.2.0\ADMIN\ORCL\ADUMP background_dump_dest = E:\ORACLE\PRODUCT\10.2.0\ADMIN\ORCL\BDUMP user_dump_dest = E:\ORACLE\PRODUCT\10.2.0\ADMIN\ORCL\UDUMP core_dump_dest = E:\ORACLE\PRODUCT\10.2.0\ADMIN\ORCL\CDUMP db_name = orcl open_cursors = 300 pga_aggregate_target = 203423744 |
我们可以看到他使用了event 来屏蔽smon的回滚段,使用了隐含参数强制打开,使用了undo参数来修改表空间。据说还使用了强制
offline 回滚段的一些参数。
实际上,针对这个问题,我们先不管怎么解决,首先我们需要分析为什么Oracle这里会报这个错误?
首先有几个问题:
1) Oracle 在open的过程之中是执行什么sql报错的?
2) 为什么会报错
3)如果强制屏蔽回滚段是否有影响?
对于第一个问题,很简单,我们搜索trace就很容易定位到是这个SQL在执行时报错的:
1 2 3 |
ORA-00600: 内部错误代码, 参数: [4193], [65], [71], [], [], [], [], [] Current SQL statement for this session: update undo$ set name=:2,file#=:3,block#=:4,status$=:5,user#=:6,undosqn=:7,xactsqn=:8,scnbas=:9,scnwrp=:10,inst#=:11,ts#=:12,spare1=:13 where us#=:1 |
进一步搜索,我们还可以定位到Oracle在执行这个递归SQL时,是在操作什么回滚段的时候报错的(这里是US#表示回滚段编号):
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 |
Cursor#5(050D0F84) state=BOUND curiob=60F2AE90 curflg=d fl2=0 par=050D0E84 ses=7AB23D28 sqltxt(7A734404)=update undo$ set name=:2,file#=:3,block#=:4,status$=:5,user#=:6,undosqn=:7,xactsqn=:8,scnbas=:9,scnwrp=:10,inst#=:11,ts#=:12,spare1=:13 where us#=:1 hash=9caba1288112094d5553173dd30cd6c3 parent=0CF83F1C maxchild=01 plk=0EEB3834 ppn=n cursor instantiation=60F2AE90 child#0(7A7342C0) pcs=0CF84120 clk=0EED3A90 ci=0CF83774 pn=0E493BC4 ctx=0C686938 kgsccflg=0 llk[60F2AE94,60F2AE94] idx=0 xscflg=e0100666 fl2=d100400 fl3=4022218c fl4=0 Bind bytecodes Opcode = 5 Bind Rpi Scalar Sql In (not out) Nocopy Offsi = 36, Offsi = 0 ...... Opcode = 1 Unoptimized Offsi = 36, Offsi = 240 kkscoacd Bind#0 oacdty=01 mxl=32(09) mxlc=00 mal=00 scl=00 pre=00 oacflg=18 fl2=0001 frm=01 csi=852 siz=32 off=0 kxsbbbfp=7a7346ba bln=32 avl=09 flg=09 value="_SYSSMU1$" Bind#1 oacdty=02 mxl=22(22) mxlc=00 mal=00 scl=00 pre=00 oacflg=08 fl2=0001 frm=00 csi=00 siz=24 off=0 kxsbbbfp=60f26c6c bln=24 avl=02 flg=05 value=2 .......... value=1 Bind#12 oacdty=02 mxl=22(22) mxlc=00 mal=00 scl=00 pre=00 oacflg=08 fl2=0001 frm=00 csi=00 siz=24 off=0 kxsbbbfp=60f26c90 bln=22 avl=02 flg=05 value=1 Frames pfr 60F27024 siz=4832 efr 60F2706C siz=4820 Cursor frame dump enxt: 3.0x00000710 enxt: 2.0x00000104 enxt: 1.0x00000ac0 pnxt: 2.0x00000004 pnxt: 1.0x00000008 kxscphp 05471F28 siz=1000 inu=0 nps=416 kxscehp 05472450 siz=1000 inu=0 nps=744 ---------------------------------------- |
很明显,我们可以看到,本质上Oracle是在对回滚段_SYSSMU1$ 进行update时出现问题了。
现在我们来回答第2个问题,Oracle为什么会报错呢? 我们再来看下这个ORA-00600错误:
ORA-00600: 内部错误代码, 参数: [4193], [65], [71], [], [], [], [], []
根据文档的解释,这个错误的意思是redo record的seq和undo record的seq不匹配导致。
那么这里的65和71到底是什么意思呢 ? 这个错误的格式是这样: ORA-600 [4193] [a] [b]
a 即 65,表示undo record seq
b 即 71,表示redo record seq.
那么Oracle这里为什么会得出一个不一致的结果呢?很明显,65 是不等于71的。
从网友提供的Trace文件,我们可以看到这样一段信息:
1 2 3 4 5 6 7 8 |
ktudb redo: siz: 252 spc: 6372 flg: 0x0012 seq: 0x0047 rec: 0x08 xid: 0x0000.026.00000052 ktubl redo: slt: 38 rci: 0 opc: 11.1 objn: 15 objd: 15 tsn: 0 Undo type: Regular undo Begin trans Last buffer split: No Temp Object: No Tablespace Undo: No 0x00000000 prev ctl uba: 0x0040019f.0047.07 prev ctl max cmt scn: 0x0000.0f92fecd prev tx cmt scn: 0x0000.0f92fecf |
很明显,这是redo相关信息,这里的seq 为0x0047,转换之后即为71. 原来,这就是ora-00600错误的b 值的来源.
那么ora-00600错误的a值,65又是哪儿来的呢 ? 从前面的UBA信息,我们可以知道,这个事务对应的undo block是:0x0040019f
那么我们来看下这个undo block中的内容是什么样的? 搜索block地址,我们发现这是一个system的block,显然是system 回滚段
的block,如下:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 |
Block after image is corrupt: buffer tsn: 0 rdba: 0x0040019f (1/415) scn: 0x0000.0f468845 seq: 0x01 flg: 0x04 tail: 0x88450201 frmt: 0x02 chkval: 0x911f type: 0x02=KTU UNDO BLOCK Hex dump of block: st=0, typ_found=1 Dump of memory from 0x08DB6000 to 0x08DB8000 8DB6000 0000A202 0040019F 0F468845 04010000 [......@.E.F.....] ......... 8DB7FF0 2E3004C1 332E02C1 323202C1 88450201 [..0....3..22..E.] ******************************************************************************** UNDO BLK: xid: 0x0000.043.00000047 seq: 0x41 cnt: 0x22 irb: 0x22 icl: 0x0 flg: 0x0000 Rec Offset Rec Offset Rec Offset Rec Offset Rec Offset --------------------------------------------------------------------------- 0x01 0x1f2c 0x02 0x1e30 0x03 0x1d34 0x04 0x1c38 0x05 0x1b3c 0x06 0x1a40 0x07 0x1944 0x08 0x1848 0x09 0x174c 0x0a 0x1650 0x0b 0x1554 0x0c 0x1458 0x0d 0x135c 0x0e 0x1260 0x0f 0x1164 0x10 0x1068 0x11 0x0f6c 0x12 0x0e70 0x13 0x0d74 0x14 0x0c78 0x15 0x0b7c 0x16 0x0a80 0x17 0x09c4 0x18 0x0908 0x19 0x084c 0x1a 0x0790 0x1b 0x06d4 0x1c 0x0618 0x1d 0x055c 0x1e 0x04a0 0x1f 0x03e4 0x20 0x02e8 0x21 0x01ec 0x22 0x00f0 ...... ....... *----------------------------- * Rec #0x7 slt: 0x22 objn: 15(0x0000000f) objd: 15 tblspc: 0(0x00000000) * Layer: 11 (Row) opc: 1 rci 0x00 Undo type: Regular undo Begin trans Last buffer split: No Temp Object: No Tablespace Undo: No rdba: 0x00000000 *----------------------------- uba: 0x0040019f.0041.06 ctl max scn: 0x0000.0f3d6a95 prv tx scn: 0x0000.0f3d6a97 txn start scn: scn: 0x0000.0f44c775 logon user: 0 prev brb: 4194714 prev bcl: 0 KDO undo record: KTB Redo op: 0x04 ver: 0x01 op: L itl: xid: 0x0000.031.00000047 uba: 0x0040019f.0041.06 flg: C--- lkc: 0 scn: 0x0000.0f44c774 KDO Op code: URP row dependencies Disabled xtype: XA flags: 0x00000000 bdba: 0x0040006a hdba: 0x00400069 itli: 1 ispac: 0 maxfr: 4863 tabn: 0 slot: 3(0x3) flag: 0x2c lock: 0 ckix: 93 ncol: 17 nnew: 12 size: 0 col 1: [ 9] 5f 53 59 53 53 4d 55 33 24 col 2: [ 2] c1 02 col 3: [ 2] c1 03 col 4: [ 2] c1 2a col 5: [ 6] c5 03 39 11 3a 05 col 6: [ 1] 80 col 7: [ 4] c3 13 4c 39 col 8: [ 4] c3 04 52 5b col 9: [ 1] 80 col 10: [ 2] c1 03 col 11: [ 2] c1 02 col 16: [ 2] c1 02 |
我们根据前面的UBA:uba: 0x0040019f.0047.07 定位到该事务的信息应该是undo block的第7个record中,当我们定位到第7个record时,
我们可以看到,这里的seq其实却是41,转换为10进制就是65. 这也就是ORA-00600这个错误的由来.
最后我们来回答第3个问题,那么就如果强制屏蔽回滚段是否有影响呢?很明显,网友这里的对象是obj 15,这是一个核心对象。
强制的屏蔽回滚段肯定是不妥的。其实处理方法有很多种,针对类似的问题我已经在道森Oracle培训的特殊恢复课程中讲过多次了。
备注:这里只是简单分析一下问题,如果你遇到类似的问题,请联系我!
Leave a Reply
You must be logged in to post a comment.