Instance immediate crash after open
本站文章除注明转载外,均为本站原创: 转载自love wife love life —Roger的Oracle/MySQL/PostgreSQL数据恢复博客
一个朋友公司的数据库出现异常,没有备份。 数据库open之后很快就crash掉,如下的alert log的信息:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 |
Tue Jul 08 22:53:03 2014 SMON: enabling cache recovery [13803] Successfully onlined Undo Tablespace 2. Undo initialization finished serial:0 start:7751194 end:7751284 diff:90 (0 seconds) Verifying file header compatibility for 11g tablespace encryption.. Verifying 11g file header compatibility for tablespace encryption completed SMON: enabling tx recovery Database Characterset is ZHS16GBK Errors in file /home/oracle/diag/rdbms/xxxx/xxxx/trace/xxxx_smon_13788.trc (incident=56461): ORA-00600: internal error code, arguments: [17182], [0x2B4E2EB8D7A0], [], [], [], [], [], [], [], [], [], [] Incident details in: /home/oracle/diag/rdbms/xxxx/xxxx/incident/incdir_56461/xxxx_smon_13788_i56461.trc Use ADRCI or Support Workbench to package the incident. See Note 411.1 at My Oracle Support for error and packaging details. No Resource Manager plan active replication_dependency_tracking turned off (no async multimaster replication found) Starting background process QMNC Tue Jul 08 22:53:04 2014 QMNC started with pid=70, OS id=13956 Completed: alter database open Tue Jul 08 22:53:05 2014 db_recovery_file_dest_size of 4977 MB is 0.00% used. This is a user-specified limit on the amount of space that will be used by this database for recovery-related files, and does not reflect the amount of space available in the underlying filesystem or ASM diskgroup. Tue Jul 08 22:53:05 2014 Dumping diagnostic data in directory=[cdmp_20140708225305], requested by (instance=1, osid=13788 (SMON)), summary=[incident=56461]. Block recovery from logseq 2855, block 100 to scn 104606896 Recovery of Online Redo Log: Thread 1 Group 5 Seq 2855 Reading mem 0 Mem# 0: /backup/oradata/xxxx/redo05.log Block recovery completed at rba 2855.131.16, scn 0.104606897 ORACLE Instance xxxx (pid = 32) - Error 600 encountered while recovering transaction (14, 2) on object 15113. Errors in file /home/oracle/diag/rdbms/xxxx/xxxx/trace/xxxx_smon_13788.trc: ORA-00600: internal error code, arguments: [17182], [0x2B4E2EB8D7A0], [], [], [], [], [], [], [], [], [], [] Tue Jul 08 22:53:05 2014 Starting background process CJQ0 Tue Jul 08 22:53:05 2014 CJQ0 started with pid=73, OS id=13984 Dumping diagnostic data in directory=[cdmp_20140708225306], requested by (instance=1, osid=13788 (SMON)), summary=[abnormal process termination]. Exception [type: SIGSEGV, SI_KERNEL(general_protection)] [ADDR:0x0] [PC:0x97EE385, kghrst()+1835] [flags: 0x0, count: 1] Errors in file /home/oracle/diag/rdbms/xxxx/xxxx/trace/xxxx_smon_13788.trc (incident=56462): ORA-07445: exception encountered: core dump [kghrst()+1835] [SIGSEGV] [ADDR:0x0] [PC:0x97EE385] [SI_KERNEL(general_protection)] [] ORA-00600: internal error code, arguments: [17182], [0x2B4E2EB8D7A0], [], [], [], [], [], [], [], [], [], [] Incident details in: /home/oracle/diag/rdbms/xxxx/xxxx/incident/incdir_56462/xxxx_smon_13788_i56462.trc Use ADRCI or Support Workbench to package the incident. See Note 411.1 at My Oracle Support for error and packaging details. Errors in file /home/oracle/diag/rdbms/xxxx/xxxx/trace/xxxx_smon_13788.trc (incident=56463): ORA-00600: internal error code, arguments: [kghfrempty:ds], [0x2B4E2EB8D790], [], [], [], [], [], [], [], [], [], [] ORA-07445: exception encountered: core dump [kghrst()+1835] [SIGSEGV] [ADDR:0x0] [PC:0x97EE385] [SI_KERNEL(general_protection)] [] ORA-00600: internal error code, arguments: [17182], [0x2B4E2EB8D7A0], [], [], [], [], [], [], [], [], [], [] Incident details in: /home/oracle/diag/rdbms/xxxx/xxxx/incident/incdir_56463/xxxx_smon_13788_i56463.trc Use ADRCI or Support Workbench to package the incident. See Note 411.1 at My Oracle Support for error and packaging details. Errors in file /home/oracle/diag/rdbms/xxxx/xxxx/trace/xxxx_smon_13788.trc (incident=56464): ORA-00600: internal error code, arguments: [kghfrempty:ds], [0x2B4E2EB8D790], [], [], [], [], [], [], [], [], [], [] ORA-07445: exception encountered: core dump [kghrst()+1835] [SIGSEGV] [ADDR:0x0] [PC:0x97EE385] [SI_KERNEL(general_protection)] [] ORA-00600: internal error code, arguments: [17182], [0x2B4E2EB8D7A0], [], [], [], [], [], [], [], [], [], [] Incident details in: /home/oracle/diag/rdbms/xxxx/xxxx/incident/incdir_56464/xxxx_smon_13788_i56464.trc Use ADRCI or Support Workbench to package the incident. See Note 411.1 at My Oracle Support for error and packaging details. Setting Resource Manager plan SCHEDULER[0x32DA]:DEFAULT_MAINTENANCE_PLAN via scheduler window Setting Resource Manager plan DEFAULT_MAINTENANCE_PLAN via parameter Tue Jul 08 22:53:09 2014 Starting background process VKRM Tue Jul 08 22:53:09 2014 VKRM started with pid=69, OS id=13988 Dumping diagnostic data in directory=[cdmp_20140708225309], requested by (instance=1, osid=13788 (SMON)), summary=[incident=56463]. Errors in file /home/oracle/diag/rdbms/xxxx/xxxx/trace/xxxx_smon_13788.trc (incident=56465): ORA-00600: internal error code, arguments: [kghfrempty:ds], [0x2B4E2EB8D790], [], [], [], [], [], [], [], [], [], [] ORA-07445: exception encountered: core dump [kghrst()+1835] [SIGSEGV] [ADDR:0x0] [PC:0x97EE385] [SI_KERNEL(general_protection)] [] ORA-00600: internal error code, arguments: [17182], [0x2B4E2EB8D7A0], [], [], [], [], [], [], [], [], [], [] Incident details in: /home/oracle/diag/rdbms/xxxx/xxxx/incident/incdir_56465/xxxx_smon_13788_i56465.trc Use ADRCI or Support Workbench to package the incident. See Note 411.1 at My Oracle Support for error and packaging details. Dumping diagnostic data in directory=[cdmp_20140708225310], requested by (instance=1, osid=13788 (SMON)), summary=[incident=56465]. Errors in file /home/oracle/diag/rdbms/xxxx/xxxx/trace/xxxx_smon_13788.trc (incident=56466): ORA-00600: internal error code, arguments: [kghfrempty:ds], [0x2B4E2EB8D790], [], [], [], [], [], [], [], [], [], [] ORA-07445: exception encountered: core dump [kghrst()+1835] [SIGSEGV] [ADDR:0x0] [PC:0x97EE385] [SI_KERNEL(general_protection)] [] ORA-00600: internal error code, arguments: [17182], [0x2B4E2EB8D7A0], [], [], [], [], [], [], [], [], [], [] Incident details in: /home/oracle/diag/rdbms/xxxx/xxxx/incident/incdir_56466/xxxx_smon_13788_i56466.trc Use ADRCI or Support Workbench to package the incident. See Note 411.1 at My Oracle Support for error and packaging details. Dumping diagnostic data in directory=[cdmp_20140708225311], requested by (instance=1, osid=13788 (SMON)), summary=[incident=56466]. Errors in file /home/oracle/diag/rdbms/xxxx/xxxx/trace/xxxx_smon_13788.trc (incident=56467): ORA-00600: internal error code, arguments: [kghfrempty:ds], [0x2B4E2EB8D790], [], [], [], [], [], [], [], [], [], [] ORA-07445: exception encountered: core dump [kghrst()+1835] [SIGSEGV] [ADDR:0x0] [PC:0x97EE385] [SI_KERNEL(general_protection)] [] ORA-00600: internal error code, arguments: [17182], [0x2B4E2EB8D7A0], [], [], [], [], [], [], [], [], [], [] Incident details in: /home/oracle/diag/rdbms/xxxx/xxxx/incident/incdir_56467/xxxx_smon_13788_i56467.trc Use ADRCI or Support Workbench to package the incident. See Note 411.1 at My Oracle Support for error and packaging details. Tue Jul 08 22:53:12 2014 Begin automatic SQL Tuning Advisor run for special tuning task "SYS_AUTO_SQL_TUNING_TASK" Tue Jul 08 22:53:13 2014 Errors in file /home/oracle/diag/rdbms/xxxx/xxxx/trace/xxxx_smon_13788.trc (incident=56468): ORA-00600: internal error code, arguments: [kghfrempty:ds], [0x2B4E2EB8D790], [], [], [], [], [], [], [], [], [], [] ORA-07445: exception encountered: core dump [kghrst()+1835] [SIGSEGV] [ADDR:0x0] [PC:0x97EE385] [SI_KERNEL(general_protection)] [] ORA-00600: internal error code, arguments: [17182], [0x2B4E2EB8D7A0], [], [], [], [], [], [], [], [], [], [] Use ADRCI or Support Workbench to package the incident. See Note 411.1 at My Oracle Support for error and packaging details. PMON (ospid: 13666): terminating the instance due to error 474 System state dump requested by (instance=1, osid=13666 (PMON)), summary=[abnormal instance termination]. System State dumped to trace file /home/oracle/diag/rdbms/xxxx/xxxx/trace/xxxx_diag_13684_20140708225315.trc Tue Jul 08 22:53:16 2014 Dumping diagnostic data in directory=[cdmp_20140708225315], requested by (instance=1, osid=13666 (PMON)), summary=[abnormal instance termination]. Tue Jul 08 22:53:17 2014 Instance terminated by PMON, pid = 13666 |
可以看出,是smon进程无法对某个事务进行恢复导致,最后Pmon进程强行终止了Instance。
具体来讲,是无法恢复这个事务:transaction (14, 2) on object 15113
从信息来看,是smon 回滚事务存在异常,那么我们来分析下smon 进程的trace文件,确认下为什么无法恢复(14,2)这个事务。
根据事务(14,2),直接搜索:000e.002 定位到如下信息:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 |
<pre class="brush:plain"> ************ dump undo block ********** Dump of buffer cache at level 3 for tsn=2 rdba=12630748 BH (0x1d9ef41a48) file#: 3 rdba: 0x00c0badc (3/47836) class: 44 ba: 0x1d84096000 set: 322 pool: 3 bsz: 8192 bsi: 0 sflg: 2 pwc: 0,0 dbwrid: 1 obj: -1 objn: 0 tsn: 2 afn: 3 hint: f hash: [0x2127e3c4e0,0x2127e3c4e0] lru: [0x1d9ef41c70,0x1d9ef41a00] ckptq: [NULL] fileq: [NULL] objq: [0x1d9ef43b10,0x2110560590] objaq: [0x1d9ef41f18,0x2110560580] st: XCURRENT md: NULL fpin: 'ktuwh23: ktubko' tch: 1 flags: LRBA: [0x0.0.0] LSCN: [0x0.0] HSCN: [0xffff.ffffffff] HSUB: [65535] buffer tsn: 2 rdba: 0x00c0badc (3/47836) scn: 0x0000.06382fde seq: 0x05 flg: 0x04 tail: 0x2fde0205 frmt: 0x02 chkval: 0x5fc6 type: 0x02=KTU UNDO BLOCK Hex dump of block: st=0, typ_found=1 Dump of memory from 0x0000001D84096000 to 0x0000001D84098000 ......省略部分内容 ******************************************************************************** UNDO BLK: xid: 0x000e.002.00034175 seq: 0x35dc cnt: 0x5 irb: 0x1 icl: 0x0 flg: 0x0000 Rec Offset Rec Offset Rec Offset Rec Offset Rec Offset --------------------------------------------------------------------------- 0x01 0x1f28 0x02 0x1ec8 0x03 0x1e38 0x04 0x1db8 0x05 0x1d1c *----------------------------- * Rec #0x1 slt: 0x02 objn: 15113(0x00003b09) objd: 15555 tblspc: 56(0x00000038) * Layer: 11 (Row) opc: 1 rci 0x00 Undo type: Regular undo Last buffer split: No Temp Object: No Tablespace Undo: No rdba: 0x00c0badb *----------------------------- KDO undo record: KTB Redo op: 0x02 ver: 0x01 compat bit: 4 (post-11) padding: 1 op: C uba: 0x00c0badb.35dc.2f KDO Op code: IRP row dependencies Disabled xtype: XA flags: 0x00000000 bdba: 0x468069a2 hdba: 0x46800c91 itli: 2 ispac: 0 maxfr: 4858 tabn: 0 slot: 108(0x6c) size/delt: 64 fb: --H-FL-- lb: 0x0 cc: 16 null: 01234567890123456789012345678901234567890123456789012345678901234567890123456789 ------NNNNN--NN- col 0: [12] 33 35 38 30 34 34 36 33 32 36 30 36 col 1: [ 1] 20 col 2: [ 9] 64 65 6c 69 76 65 72 65 64 col 3: [ 5] 30 32 31 30 30 col 4: [ 4] c9 cf ba a3 col 5: [ 1] 30 col 6: *NULL* col 7: *NULL* col 8: *NULL* col 9: *NULL* col 10: *NULL* col 11: [ 7] 78 72 04 04 11 23 02 col 12: [ 7] 78 72 04 04 11 23 02 col 13: *NULL* col 14: *NULL* col 15: [ 0] *----------------------------- * Rec #0x2 slt: 0x02 objn: 15113(0x00003b09) objd: 15555 tblspc: 56(0x00000038) * Layer: 11 (Row) opc: 1 rci 0x01 Undo type: Regular undo User Undo Applied Last buffer split: No Temp Object: No Tablespace Undo: No rdba: 0x00000000 *----------------------------- KDO undo record: KTB Redo op: 0x02 ver: 0x01 compat bit: 4 (post-11) padding: 1 op: C uba: 0x00c0badc.35dc.01 KDO Op code: IRP row dependencies Disabled xtype: XA flags: 0x00000000 bdba: 0x468069a2 hdba: 0x46800c91 itli: 2 ispac: 0 maxfr: 4858 tabn: 0 slot: 107(0x6b) size/delt: 9 fb: --H----- lb: 0x0 cc: 0 nrid: 0x468069b9.7 null: *----------------------------- * Rec #0x3 slt: 0x02 objn: 15404(0x00003c2c) objd: 15556 tblspc: 4(0x00000004) * Layer: 10 (Index) opc: 22 rci 0x02 Undo type: Regular undo User Undo Applied Last buffer split: No Temp Object: No Tablespace Undo: No rdba: 0x00000000 *----------------------------- index undo for leaf key operations KTB Redo op: 0x04 ver: 0x01 compat bit: 4 (post-11) padding: 1 op: L itl: xid: 0x000e.00f.00034021 uba: 0x00c07ed9.35d5.18 flg: C--- lkc: 0 scn: 0x0000.0635da5a Dump kdilk : itl=2, kdxlkflg=0x25 sdc=2 indexid=0x100008a block=0x01002052 (kdxlre): restore leaf row (clear leaf delete flags) number of keys: 2 key sizes: 18 18 key :(36): 05 30 32 31 30 30 ff 0a 00 00 3c c3 46 80 69 a2 00 6c 05 30 32 31 30 30 ff 0a 00 00 3c c3 46 80 69 a2 00 6e selflock: (1): 00 bitmap: (1): fc *----------------------------- * Rec #0x4 slt: 0x02 objn: 15139(0x00003b23) objd: 15559 tblspc: 57(0x00000039) * Layer: 10 (Index) opc: 22 rci 0x03 Undo type: Regular undo User Undo Applied Last buffer split: No Temp Object: No Tablespace Undo: No rdba: 0x00000000 *----------------------------- index undo for leaf key operations KTB Redo op: 0x04 ver: 0x01 compat bit: 4 (post-11) padding: 1 op: L itl: xid: 0x000e.00e.0001da74 uba: 0x00c144ff.2156.16 flg: C--- lkc: 0 scn: 0x0000.04d6e213 Dump kdilk : itl=2, kdxlkflg=0x25 sdc=2 indexid=0x474000a2 block=0x47400ffe (kdxlre): restore leaf row (clear leaf delete flags) number of keys: 2 key sizes: 9 9 key :(18): 01 20 06 46 80 69 a2 00 6c 01 20 06 46 80 69 a2 00 6e selflock: (1): 00 bitmap: (1): fc *----------------------------- * Rec #0x5 slt: 0x02 objn: 15134(0x00003b1e) objd: 15560 tblspc: 57(0x00000039) * Layer: 10 (Index) opc: 22 rci 0x04 Undo type: Regular undo User Undo Applied Last buffer split: No Temp Object: No Tablespace Undo: No rdba: 0x00000000 *----------------------------- index undo for leaf key operations KTB Redo op: 0x04 ver: 0x01 compat bit: 4 (post-11) padding: 1 op: L itl: xid: 0x000d.002.000287af uba: 0x00c1b5ce.37e8.1f flg: C--- lkc: 0 scn: 0x0000.05e39111 Dump kdilk : itl=2, kdxlkflg=0x25 sdc=1 indexid=0x47400082 block=0x47800db3 (kdxlre): restore leaf row (clear leaf delete flags) number of keys: 2 key sizes: 13 13 key :(26): 0c 33 35 38 30 34 34 36 33 32 35 38 38 0c 33 35 38 30 34 34 36 33 32 36 30 36 keydata/bitmap: (20): 00 00 3c c3 46 80 69 a2 00 6e 00 00 3c c3 46 80 69 a2 00 6c selflock: (1): 00 bitmap: (1): fc kcra_dump_redo_internal: skipped for critical process ORACLE Instance xxxx (pid = 32) - Error 600 encountered while recovering transaction (14, 2) on object 15113. |
可以看到,(14,2)这个事务包含了5个undo record,也就是需要进行5个操作,然后到最后record 1的时候无法进行了。
真的是这样吗?
我们首先来看下该undo block的dump信息:
xid: 0x000e.002.00034175 seq: 0x35dc cnt: 0x5 irb: 0x1
从这里我们可以看出,这个undo block包含了5个undo record,然而针对该事务而言,起点是irb:0x1,也就是第一个undo record。
这里明显是有问题的,我们可以看到如下的几个undo recor的rci值分别是:0x04 0x03 0x02 0x01 0x00
所以,从undo record的rci值来判断,这个事务应该是从undo record 5开始rollback到undo record 1结束。
但是undo block中提到的irb值居然是0x1 ? 这里irb:0x1 是第一个undo record,其rci值本身就为0了,怎么回滚?
因此,很明显这里是存在问题。
所以如果想要临时解决该数据库open之后不crash的问题,那么可以直接修改下这个undo block.修改下irb或rci的值。
备注:朋友这里的库,其他还有其他的文件存在坏块,这里暂不讨论。因为没有其他信息了。
Leave a Reply
You must be logged in to post a comment.