数据库open遭遇ora-01555错误
本站文章除注明转载外,均为本站原创: 转载自love wife love life —Roger的Oracle/MySQL/PostgreSQL数据恢复博客
本文链接地址: 数据库open遭遇ora-01555错误
前几天我们的一位准客户的其中一套较为重要的数据库出现了故障。我们这里先姑且不去分析原因,来将数据库打开提供业务恢复再说。首先我们来看下一线工程师现场发回的报道:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 |
ORA-01555 caused by SQL statement below (SQL ID: 4krwuz0ctqxdt, SCN: 0x0e0a.938dbd1d): select ctime, mtime, stime from obj$ where obj# = :1 Errors in file /data/oracle/app/diag/rdbms/orcl/orcl/trace/orcl_ora_42419.trc: ORA-00704: bootstrap process failure ORA-00704: bootstrap process failure ORA-00604: error occurred at recursive SQL level 1 ORA-01555: snapshot too old: rollback segment number 37 with name “_SYSSMU37_383198352$” too small Errors in file /data/oracle/app/diag/rdbms/orcl/orcl/trace/orcl_ora_42419.trc: ORA-00704: bootstrap process failure ORA-00704: bootstrap process failure ORA-00604: error occurred at recursive SQL level 1 ORA-01555: snapshot too old: rollback segment number 37 with name “_SYSSMU37_383198352$” too small Error 704 happened during db open, shutting down database USER (ospid: 42419): terminating the instance due to error 704 Instance terminated by USER, pid = 42419 ORA-1092 signalled during: alter database open resetlogs… opiodr aborting process unknown ospid (42419) as a result of ORA-1092 Mon Jul 03 10:47:32 2017 ORA-1092 : opitsk aborting process |
从上述的错误来看。数据库在open时,其中一个递归SQL语句执行失败,该递归SQL执行失败的原因是出现了ora-01555错误,即大家所熟知的快照过旧;同时日志中也明确提到了需要访问的回滚段编号,即第37号回滚段。
根据我们一般的处理思路,需要进行10046 trace跟踪,确认这里的递归SQL是不是访问了一些存在活动事务的Block。
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 |
PARSING IN CURSOR #5 len=52 dep=1 uid=0 oct=3 lid=0 tim=1499247077576035 hv=429618617 ad=’37bf8fe40′ sqlid=’4krwuz0ctqxdt’ select ctime, mtime, stime from obj$ where obj# = :1 END OF STMT PARSE #5:c=0,e=346,p=0,cr=0,cu=0,mis=1,r=0,dep=1,og=4,plh=0,tim=1499247077576034 BINDS #5: Bind#0 oacdty=02 mxl=22(22) mxlc=00 mal=00 scl=00 pre=00 oacflg=08 fl2=0001 frm=00 csi=00 siz=24 off=0 kxsbbbfp=7ff28b33fff8 bln=22 avl=02 flg=05 value=20 EXEC #5:c=999,e=724,p=0,cr=0,cu=0,mis=1,r=0,dep=1,og=4,plh=1218588913,tim=1499247077576860 WAIT #5: nam=’db file sequential read’ ela= 31 file#=1 block#=337 blocks=1 obj#=36 tim=1499247077577083 ===================== PARSING IN CURSOR #6 len=142 dep=2 uid=0 oct=3 lid=0 tim=1499247077577673 hv=361892850 ad=’377f8c780′ sqlid=’7bd391hat42zk’ select /*+ rule */ name,file#,block#,status$,user#,undosqn,xactsqn,scnbas,scnwrp,DECODE(inst#,0,NULL,inst#),ts#,spare1 from undo$ where us#=:1 END OF STMT PARSE #6:c=1000,e=472,p=0,cr=0,cu=0,mis=1,r=0,dep=2,og=3,plh=0,tim=1499247077577672 BINDS #6: Bind#0 oacdty=02 mxl=22(22) mxlc=00 mal=00 scl=00 pre=00 oacflg=08 fl2=0001 frm=00 csi=00 siz=24 off=0 kxsbbbfp=7ff28b33e398 bln=22 avl=02 flg=05 value=37 EXEC #6:c=0,e=705,p=0,cr=0,cu=0,mis=1,r=0,dep=2,og=3,plh=906473769,tim=1499247077578470 WAIT #6: nam=’db file sequential read’ ela= 19 file#=1 block#=321 blocks=1 obj#=34 tim=1499247077578588 WAIT #6: nam=’db file sequential read’ ela= 18 file#=1 block#=225 blocks=1 obj#=15 tim=1499247077578756 FETCH #6:c=1000,e=313,p=2,cr=2,cu=0,mis=0,r=1,dep=2,og=3,plh=906473769,tim=1499247077578814 STAT #6 id=1 cnt=1 pid=0 pos=1 obj=15 op=’TABLE ACCESS BY INDEX ROWID UNDO$ (cr=2 pr=2 pw=0 time=0 us)’ STAT #6 id=2 cnt=1 pid=1 pos=1 obj=34 op=’INDEX UNIQUE SCAN I_UNDO1 (cr=1 pr=1 pw=0 time=0 us)’ CLOSE #6:c=0,e=9,dep=2,type=0,tim=1499247077578910 WAIT #5: nam=’db file sequential read’ ela= 39 file#=157 block#=164013 blocks=1 obj#=0 tim=1499247077579013 FETCH #5:c=2000,e=2852,p=4,cr=4,cu=0,mis=0,r=0,dep=1,og=4,plh=1218588913,tim=1499247077579747 STAT #5 id=1 cnt=0 pid=0 pos=1 obj=18 op=’TABLE ACCESS BY INDEX ROWID OBJ$ (cr=0 pr=0 pw=0 time=0 us)’ STAT #5 id=2 cnt=0 pid=1 pos=1 obj=36 op=’INDEX RANGE SCAN I_OBJ1 (cr=0 pr=0 pw=0 time=0 us)’ ORA-00704: bootstrap process failure ORA-00704: bootstrap process failure ORA-00604: error occurred at recursive SQL level 1 ORA-01555: snapshot too old: rollback segment number 37 with name “_SYSSMU37_383198352$” too small ORA-00704: bootstrap process failure ORA-00704: bootstrap process failure ORA-00604: error occurred at recursive SQL level 1 ORA-01555: snapshot too old: rollback segment number 37 with name “_SYSSMU37_383198352$” too small |
从10046 跟踪来看,报错的SQL 语句访问了2个block;分别是file 1 block 337和file 157 block 164013. 很明显第一个数据块应该是数据字典的block,而157号文件的这个block应该是undo block,因为这里的obj#=0.
接着我们来看看file 1 block 337 的blockdump情况:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 |
Dump all the blocks in range: buffer tsn: 0 rdba: 0x00400151 (1/337) scn: 0x0e15.00a27c49 seq: 0x01 flg: 0x04 tail: 0x7c490601 frmt: 0x02 chkval: 0xfa69 type: 0x06=trans data Hex dump of block: st=0, typ_found=1 …… Block header dump: 0x00400151 Object id on Block? Y seg/obj: 0x24 csc: 0xe15a27c49 itc: 1 flg: – typ: 2 – INDEX fsl: 0 fnx: 0x0 ver: 0x01 Itl Xid Uba Flag Lck Scn/Fsc 0x01 0x0025.020.0002c57b 0x274280ad.1eae.02 C— 0 scn 0x0e15.00a26a0c Branch block dump ================= header address 139957081160260=0x7f4a4c1b3a44 kdxcolev 1 KDXCOLEV Flags = – – – kdxcolok 0 kdxcoopc 0x80: opcode=0: iot flags=— is converted=Y kdxconco 3 kdxcosdc 1 kdxconro 235 kdxcofbo 498=0x1f2 kdxcofeo 5520=0x1590 kdxcoavs 5239 kdxbrlmc 4194642=0x400152 kdxbrsno 234 kdxbrbksz 8056 kdxbr2urrc 0 row#0[8047] dba: 4194643=0x400153 |
从block dump来看,这是一个Index Block。从ITL的信息来看,这个Index Block没有任何活动事务。同时,根据前面的10046 trace来看,报错的递归SQL访问的obj#=20,换成为16进制为c1 15,然而这个Index block 中并没有这个键值;同时我们dump了下一个index block 找到了对应的键值。
我们可以看出,这个index是一个复合索引,其中col 0的c1 15就是表示20. 该行数据对应的数据块地址是004000f100,转换为10进制是:4194545。
我们回到前面的这个问题,为什么递归SQL访问file block 337 然后接着需要去访问undo block呢? 而且从10046 trace来看fetch r=0,表明并没有获取到数据。说明问题仍然出在这个block的访问上。
这里我们进一步该block的dump来看,发现其scn如下:
1 2 3 4 5 6 7 8 9 |
SQL> select to_number('e1500a26a0c','xxxxxxxxxxxxxxxxx') CTM from dual; CTM --------------------- 15483367746060 |
当通过dump控制文件的scn来看,明显要小的多,如下:
1 2 3 4 |
Redo Version: compatible=0xb200000 #Data files = 85, #Online files = 85 Database checkpoint: Thread=1 scn: 0x0e0a.9399087a Threads: #Enabled=1, #Open=1, Head=1, Tail=1 |
我们将上述的database checkpoint进行转换:
1 2 3 4 5 |
www.killdb.com@select to_number('e0a9399087a','xxxxxxxxxxxxxxxxxxxxx') db_scn from dual; DB_SCN --------------------------- 15438588741754 |
很明显数据库的checkpoint 明显要比这个Index Block的scn小的多,也就势必导致数据库在启动的时候需要去访问Undo Block。所以这里经过单次的修改undo$ 将对应的37号回滚段标记为offline都无法解决这个问题。这里我们首先尝试清除了file 1 block 377的ITL信息之后,启动数据库发现错误发生了改变,如下:
1 2 3 4 5 6 7 8 9 10 11 12 |
Errors in file /data/oracle/app/diag/rdbms/orcl/orcl/trace/orcl_ora_46250.trc (incident=90909): ORA-00600: internal error code, arguments: [2662], [3605], [10648650], [3605], [48684112], [4194545], [], [], [], [], [], [] Incident details in: /data/oracle/app/diag/rdbms/orcl/orcl/incident/incdir_90909/orcl_ora_46250_i90909.trc Errors in file /data/oracle/app/diag/rdbms/orcl/orcl/trace/orcl_ora_46250.trc: ORA-00704: bootstrap process failure ORA-00704: bootstrap process failure ORA-00600: internal error code, arguments: [2662], [3605], [10648650], [3605], [48684112], [4194545], [], [], [], [], [], [] Errors in file /data/oracle/app/diag/rdbms/orcl/orcl/trace/orcl_ora_46250.trc: ORA-00704: bootstrap process failure ORA-00704: bootstrap process failure ORA-00600: internal error code, arguments: [2662], [3605], [10648650], [3605], [48684112], [4194545], [], [], [], [], [], [] Error 704 happened during db open, shutting down database |
这个错误就非常明白了,就是block scn的问题。而报错的数据块地址为:4194545,这就是我们前面提到的4000f1这个数块,即file 1 block 241 这个数据块。
看起来这个错误本质上来说,可以直接推进scn解决问题。这里我们通过设置*._minimum_giga_scn参数来解决问题。通过设置了该参数之后,成功打开了数据库。
虽然数据库alert log后续还有一些ora-00600 [4097],ora-08102等错误,但是处理都相对简单了。通过重建undo、重建Index即可解决。
Leave a Reply
You must be logged in to post a comment.