Oracle 19c rac instance crash due to ora-00600 kghuclientasp_03 and ora-00600 17112
本站文章除注明转载外,均为本站原创: 转载自love wife love life —Roger的Oracle/MySQL/PostgreSQL数据恢复博客
本文链接地址: Oracle 19c rac instance crash due to ora-00600 kghuclientasp_03 and ora-00600 17112
近几天某客户核心业务系统进行全面改造,将其他数据迁移并加工处理到zdata一体机环境中;其中数据库环境为Oracle RAC 19.14版本,4个计算节点,存储节点为5个zdata stroage(全闪)。整体性能是比较强劲的。
然而此次业务迁移改造,所有业务逻辑处理几乎均为PL/SQL来实现,每个节都同时调用数十个Job运行,且采用了大量的nologging操作,最终导致某个节点instance crash,如下所示:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 |
2022-05-25T18:41:31.513484+08:00 Thread 1 advanced to log sequence 33773 (LGWR switch), current SCN: 17721853016325 Current log# 2 seq# 33773 mem# 0: +DG_DATA01/SCSBGJB/ONLINELOG/group_2.275.1098292431 2022-05-25T18:41:56.795929+08:00 Errors in file /u01/app/oracle/diag/rdbms/xxxxx/xxxxx1/trace/xxxxx1_j00c_45595.trc (incident=2727144) (PDBNAME=PDBSCSB04): ORA-00600: 内部错误代码, 参数: [kghuclientasp_03], [0x7F0913B03728], [0], [0], [0], [], [], [], [], [], [], [] PDBSCSB04(7):Use ADRCI or Support Workbench to package the incident. See Note 411.1 at My Oracle Support for error and packaging details. Errors in file /u01/app/oracle/diag/rdbms/xxxxx/xxxxx1/trace/xxxxx1_j00c_45595.trc (incident=2727145) (PDBNAME=PDBSCSB04): ORA-00600: 内部错误代码, 参数: [kghuclientasp_03], [0x7F0913B03728], [0], [0], [0], [], [], [], [], [], [], [] ORA-00600: 内部错误代码, 参数: [kghuclientasp_03], [0x7F0913B03728], [0], [0], [0], [], [], [], [], [], [], [] PDBSCSB04(7):Use ADRCI or Support Workbench to package the incident. See Note 411.1 at My Oracle Support for error and packaging details. Errors in file /u01/app/oracle/diag/rdbms/xxxxx/xxxxx1/trace/xxxxx1_j00c_45595.trc (incident=2727146) (PDBNAME=PDBSCSB04): ORA-00600: 内部错误代码, 参数: [kghuclientasp_03], [0x7F0913B03728], [0], [0], [0], [], [], [], [], [], [], [] ORA-00600: 内部错误代码, 参数: [kghuclientasp_03], [0x7F0913B03728], [0], [0], [0], [], [], [], [], [], [], [] PDBSCSB04(7):Incident details in: /u01/app/oracle/diag/rdbms/xxxxx/xxxxx1/incident/incdir_2727146/xxxxx1_j00c_45595_i2727146.trc 2022-05-25T18:41:58.696027+08:00 Thread 1 advanced to log sequence 33774 (LGWR switch), current SCN: 17721853454395 Current log# 3 seq# 33774 mem# 0: +DG_DATA01/xxxxx/ONLINELOG/group_3.273.1098292435 2022-05-25T18:42:01.188468+08:00 。。。。。 2022-05-25T18:44:02.245773+08:00 PDBSCSB00(3):Use ADRCI or Support Workbench to package the incident. See Note 411.1 at My Oracle Support for error and packaging details. Errors in file /u01/app/oracle/diag/rdbms/xxxxx/xxxxx1/trace/xxxxx1_ora_89170.trc (incident=2732186) (PDBNAME=PDBSCSB00): ORA-00700: soft internal error, arguments: [ksepop:1 ksepop recursion ], [], [], [], [], [], [], [], [], [], [], [] ORA-00600: internal error code, arguments: [kghfrmrg:nxt], [0x0FB6CEBE0], [], [], [], [], [], [], [], [], [], [] ORA-00600: internal error code, arguments: [kghfrh:ds], [0x0DB4CCBD8], [], [], [], [], [], [], [], [], [], [] PDBSCSB00(3):Incident details in: /u01/app/oracle/diag/rdbms/xxxxx/xxxxx1/incident/incdir_2732186/xxxxx1_ora_89170_i2732186.trc 2022-05-25T18:44:06.010875+08:00 Errors in file /u01/app/oracle/diag/rdbms/xxxxx/xxxxx1/trace/xxxxx1_cl04_82221.trc (incident=2726200) (PDBNAME=PDBSCSB00): ORA-00600: internal error code, arguments: [17112], [0x0DB4CCBC0], [], [], [], [], [], [], [], [], [], [] PDBSCSB00(3):Incident details in: /u01/app/oracle/diag/rdbms/xxxxx/xxxxx1/incident/incdir_2726200/xxxxx1_cl04_82221_i2726200.trc PDBSCSB00(3):Use ADRCI or Support Workbench to package the incident. See Note 411.1 at My Oracle Support for error and packaging details. 2022-05-25T18:44:06.510726+08:00 Errors in file /u01/app/oracle/diag/rdbms/xxxxx/xxxxx1/trace/xxxxx1_p01v_60845.trc (incident=2735720) (PDBNAME=PDBSCSB02): ORA-00600: 内部错误代码, 参数: [kghfrh:ds], [0x0DB4CDBD8], [], [], [], [], [], [], [], [], [], [] PDBSCSB02(5):Incident details in: /u01/app/oracle/diag/rdbms/xxxxx/xxxxx1/incident/incdir_2735720/xxxxx1_p01v_60845_i2735720.trc PDBSCSB02(5):Use ADRCI or Support Workbench to package the incident. See Note 411.1 at My Oracle Support for error and packaging details. 2022-05-25T18:44:08.899072+08:00 opidrv aborting process CL04 ospid (82221) as a result of ORA-600 2022-05-25T18:44:08.899220+08:00 Errors in file /u01/app/oracle/diag/rdbms/xxxxx/xxxxx1/trace/xxxxx1_cl04_82221.trc: ORA-00600: internal error code, arguments: [17112], [0x0DB4CCBC0], [], [], [], [], [], [], [], [], [], [] 2022-05-25T18:44:12.455063+08:00 Errors in file /u01/app/oracle/diag/rdbms/xxxxx/xxxxx1/trace/xxxxx1_p01v_60845.trc (incident=2735721) (PDBNAME=PDBSCSB02): ORA-00600: 内部错误代码, 参数: [kghfrmrg:nxt], [0x0FB6CFBE0], [], [], [], [], [], [], [], [], [], [] ORA-00600: 内部错误代码, 参数: [kghfrh:ds], [0x0DB4CDBD8], [], [], [], [], [], [], [], [], [], [] PDBSCSB02(5):Incident details in: /u01/app/oracle/diag/rdbms/xxxxx/xxxxx1/incident/incdir_2735721/xxxxx1_p01v_60845_i2735721.trc 2022-05-25T18:44:15.338840+08:00 PDBSCSB02(5):Use ADRCI or Support Workbench to package the incident. See Note 411.1 at My Oracle Support for error and packaging details. Errors in file /u01/app/oracle/diag/rdbms/xxxxx/xxxxx1/trace/xxxxx1_p01v_60845.trc (incident=2735722) (PDBNAME=PDBSCSB02): ORA-00700: 软内部错误, 参数: [ksepop:1 ksepop recursion ], [], [], [], [], [], [], [], [], [], [], [] ORA-00600: 内部错误代码, 参数: [kghfrmrg:nxt], [0x0FB6CFBE0], [], [], [], [], [], [], [], [], [], [] ORA-00600: 内部错误代码, 参数: [kghfrh:ds], [0x0DB4CDBD8], [], [], [], [], [], [], [], [], [], [] PDBSCSB02(5):Incident details in: /u01/app/oracle/diag/rdbms/xxxxx/xxxxx1/incident/incdir_2735722/xxxxx1_p01v_60845_i2735722.trc 2022-05-25T18:44:18.354945+08:00 Errors in file /u01/app/oracle/diag/rdbms/xxxxx/xxxxx1/trace/xxxxx1_cl02_82217.trc (incident=2726160) (PDBNAME=PDBSCSB02): ORA-00600: internal error code, arguments: [17112], [0x0DB4CDBC0], [], [], [], [], [], [], [], [], [], [] PDBSCSB02(5):Incident details in: /u01/app/oracle/diag/rdbms/xxxxx/xxxxx1/incident/incdir_2726160/xxxxx1_cl02_82217_i2726160.trc PDBSCSB02(5):Use ADRCI or Support Workbench to package the incident. See Note 411.1 at My Oracle Support for error and packaging details. 2022-05-25T18:44:18.414790+08:00 Errors in file /u01/app/oracle/diag/rdbms/xxxxx/xxxxx1/trace/xxxxx1_ora_90359.trc (incident=2727648) (PDBNAME=PDBSJQY): ORA-00600: internal error code, arguments: [17147], [0x0DB4CABC0], [], [], [], [], [], [], [], [], [], [] PDBSJQY(8):Incident details in: /u01/app/oracle/diag/rdbms/xxxxx/xxxxx1/incident/incdir_2727648/xxxxx1_ora_90359_i2727648.trc PDBSJQY(8):Use ADRCI or Support Workbench to package the incident. See Note 411.1 at My Oracle Support for error and packaging details. 2022-05-25T18:44:21.695996+08:00 opidrv aborting process CL02 ospid (82217) as a result of ORA-600 2022-05-25T18:44:21.696114+08:00 Errors in file /u01/app/oracle/diag/rdbms/xxxxx/xxxxx1/trace/xxxxx1_cl02_82217.trc: ORA-00600: internal error code, arguments: [17112], [0x0DB4CDBC0], [], [], [], [], [], [], [], [], [], [] 2022-05-25T18:44:21.702634+08:00 Dumping diagnostic data in directory=[cdmp_20220525184421], requested by (instance=1, osid=82217 (CL02)), summary=[incident=2726160]. 2022-05-25T18:44:21.787634+08:00 PMON (ospid: 72253): terminating the instance due to ORA error 12752 2022-05-25T18:44:21.787800+08:00 |
从上述节点alert log来看,出现了大量的ora-00600和ora-07445 错误。其中kghuclientasp_03 相对少见,另外ksepop:1 ksepop recursion、kghfrmrg:nxt、[kghfrh:ds] 也是见过不少了。后面2个都与Oracle内存有关系。
实际上对于kghuclientasp_03这个函数而言,从前面的关键字可以猜出也也必然跟内存有关。
从其中一个trace文件中可以看到如下内容:
1 2 3 4 5 6 7 8 9 |
Chunk 7fa8d2191340 sz= 1600 alloc "pmuccst: adt/re" Chunk 7fa8d2191980 sz= 1600 alloc "pmuccst: adt/re" ERROR, BATCH-HEAP MISMATCH for batch 68 [7fa8d1e90000][7fa8d1e9a060] BATCH HEADER 68 addr=7fa8d1ea4aa8 (prv=7fa8d218e020 nxt=7fa8d2316688) Chunk 7fa8d1ea4ad0 sz= 104 alloc "pl/sql vc2 " ERROR, ZERO-SIZED CHUNK addr=7fa8d1ea4b38 BATCH HEADER 69 addr=7fa8d2316670 (prv=7fa8d1ea4ac0 nxt=7fa8d23176c8) Chunk 7fa8d2316698 sz= 72 alloc "pl/sql vc2 " Chunk 7fa8d23166e0 sz= 64 alloc "pl/sql vc2 " |
从上述信息不难看出,确实出现了ERROR,batch-heap mismatch的报错信息,而且对应均为PL/SQL操作。
进一步搜索发现相关进程的操作几乎均为INSERT /*+ nologging */ INTO 操作;由于是并发调用,而且均是数十个JOB同时调用,因此这里我们不得不怀疑,在nologging+并发过大的情况下,会出现heap error的情况。
当然对于此次实例crash,进一步分析可以看到本质是CL02、CL04进程异常最后导致实例crash掉。
从Oracle 12c 开始引入了clmn(进程清理主进程即 Cleanup Main Process)以及CLnn(进程清理辅进程即Cleanup Helper Processes)。这2类进程的引入,可以极大的缓解PMON进程的压力。
对于CL进程,本质上也是Oracle RAC的核心进程;对于Oracle核心进程(即不能kill的进程),可通过如下脚本进行查询:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 |
SQL> SELECT indx,ksuprpnm,TO_CHAR(ksuprflg,'XXXXXXXXXXXXXXXX'),KSUPROSID 2 FROM x$ksupr 3 WHERE BITAND(ksuprflg,4) = 4 ORDER BY indx 4 / INDX KSUPRPNM TO_CHAR(KSUPRFLG, KSUPROSID ---------- ------------------------------------------------ ----------------- ------------------------ 2 oracle@dbser11 (PMON) E 92141 3 oracle@dbser11 (CLMN) E 92145 4 oracle@dbser11 (PSP0) 6 92149 5 oracle@dbser11 (IPC0) 6 92156 6 oracle@dbser11 (VKTM) 6 92161 7 oracle@dbser11 (GEN0) 6 92167 8 oracle@dbser11 (MMAN) 6 92171 15 oracle@dbser11 (DBRM) 6 92186 19 oracle@dbser11 (ACMS) 6 92196 20 oracle@dbser11 (PMAN) 6 92200 22 oracle@dbser11 (LMON) 6 92206 23 oracle@dbser11 (LMD0) 6 92210 24 oracle@dbser11 (LMS0) 6 92212_92222 26 oracle@dbser11 (LMS1) 6 92214_92224 28 oracle@dbser11 (LMS2) 6 92216_92227 30 oracle@dbser11 (LMS3) 6 92219_92239 32 oracle@dbser11 (LMS4) 6 92223_92238 34 oracle@dbser11 (LMS5) 6 92226_92242 36 oracle@dbser11 (LMS6) 6 92231_92247 38 oracle@dbser11 (LMS7) 6 92236_92254 40 oracle@dbser11 (LMS8) 6 92241_92264 42 oracle@dbser11 (LMS9) 6 92245_92266 44 oracle@dbser11 (LMSA) 6 92251_92270 46 oracle@dbser11 (LMSB) 6 92258_92283 48 oracle@dbser11 (LMSC) 6 92262_92285 50 oracle@dbser11 (LMSD) 6 92267_92293 52 oracle@dbser11 (LMSE) 6 92271_92297 54 oracle@dbser11 (LMSF) 6 92275_92298 56 oracle@dbser11 (LMSG) 6 92279_92306 58 oracle@dbser11 (LMSH) 6 92282_92312 60 oracle@dbser11 (LMSI) 6 92286_92316 62 oracle@dbser11 (LMSJ) 6 92288_92335 64 oracle@dbser11 (LMSK) 6 92295_92327 66 oracle@dbser11 (LMSL) 6 92300_92358 68 oracle@dbser11 (LMSM) 6 92304_92344 70 oracle@dbser11 (LMSN) 6 92310_92351 72 oracle@dbser11 (LMSO) 6 92314_92362 74 oracle@dbser11 (LMSP) 6 92317_92374 76 oracle@dbser11 (LMSQ) 6 92321_92373 78 oracle@dbser11 (LMSR) 6 92325_92380 80 oracle@dbser11 (LMSS) 6 92329_92359 82 oracle@dbser11 (LMST) 6 92332_92391 84 oracle@dbser11 (LMSU) 6 92334_92387 86 oracle@dbser11 (LMSV) 6 92337_92360 88 oracle@dbser11 (LMSW) 6 92342_92372 90 oracle@dbser11 (LMSX) 6 92346_92365 92 oracle@dbser11 (LMSY) 6 92348_92361 94 oracle@dbser11 (LMSZ) 6 92353_92397 96 oracle@dbser11 (LM10) 6 92355_92401 98 oracle@dbser11 (LMD1) 6 92357 99 oracle@dbser11 (LMD2) 6 92390 100 oracle@dbser11 (LMD3) 6 92409 101 oracle@dbser11 (LMD4) 6 92414 102 oracle@dbser11 (RMS0) 6 92418 104 oracle@dbser11 (LCK1) 6 92424 105 oracle@dbser11 (DBW0) 6 92429 106 oracle@dbser11 (DBW1) 6 92433 107 oracle@dbser11 (DBW2) 6 92437 108 oracle@dbser11 (DBW3) 6 92441 109 oracle@dbser11 (DBW4) 6 92445 110 oracle@dbser11 (DBW5) 6 92449 111 oracle@dbser11 (DBW6) 6 92453 112 oracle@dbser11 (DBW7) 6 92457 113 oracle@dbser11 (DBW8) 6 92461 114 oracle@dbser11 (DBW9) 6 92465 115 oracle@dbser11 (DBWA) 6 92469 116 oracle@dbser11 (DBWB) 6 92474 117 oracle@dbser11 (DBWC) 6 92478 118 oracle@dbser11 (DBWD) 6 92482 119 oracle@dbser11 (DBWE) 6 92486 120 oracle@dbser11 (DBWF) 6 92490 121 oracle@dbser11 (DBWG) 6 92498 122 oracle@dbser11 (DBWH) 6 92502 123 oracle@dbser11 (DBWI) 6 92506 124 oracle@dbser11 (DBWJ) 6 92512 125 oracle@dbser11 (DBWK) 6 92516 126 oracle@dbser11 (DBWL) 6 92520 127 oracle@dbser11 (DBWM) 6 92524 128 oracle@dbser11 (DBWN) 6 92530 129 oracle@dbser11 (DBWO) 6 92534 130 oracle@dbser11 (CR00) 6 92214_92535 131 oracle@dbser11 (DBWP) 6 92540 133 oracle@dbser11 (DBWQ) 6 92546 134 oracle@dbser11 (RS01) 6 92214_92551 135 oracle@dbser11 (DBWR) 6 92550 136 oracle@dbser11 (LGWR) 6 92555 137 oracle@dbser11 (CKPT) 6 92559 138 oracle@dbser11 (CR00) 6 92216_92560 139 oracle@dbser11 (SMON) 16 92564 140 oracle@dbser11 (CR00) 6 92219_92565 143 oracle@dbser11 (CR00) 6 92226_92571 144 oracle@dbser11 (CR00) 6 92212_92574 145 oracle@dbser11 (LREG) 6 92576 146 oracle@dbser11 (CR00) 6 92231_92577 147 oracle@dbser11 (CR00) 6 92223_92578 148 oracle@dbser11 (RS02) 6 92216_92599 150 oracle@dbser11 (RBAL) 6 92584 151 oracle@dbser11 (ASMB) 6 92588 152 oracle@dbser11 (FENC) 6 92592 155 oracle@dbser11 (CR00) 6 92241_92602 156 oracle@dbser11 (CR00) 6 92245_92603 158 oracle@dbser11 (CR00) 6 92251_92604 159 oracle@dbser11 (RS03) 6 92219_92607 160 oracle@dbser11 (CR00) 6 92236_92608 161 oracle@dbser11 (CR00) 6 92262_92609 162 oracle@dbser11 (CR00) 6 92288_92610 163 oracle@dbser11 (CR00) 6 92275_92611 165 oracle@dbser11 (RS05) 6 92226_92614 166 oracle@dbser11 (CR00) 6 92271_92615 167 oracle@dbser11 (CR00) 6 92337_92616 168 oracle@dbser11 (CR00) 6 92286_92617 169 oracle@dbser11 (CR00) 6 92258_92620 170 oracle@dbser11 (CR00) 6 92332_92621 171 oracle@dbser11 (RS00) 6 92212_92623 172 oracle@dbser11 (CR00) 6 92282_92624 173 oracle@dbser11 (CR00) 6 92329_92625 174 oracle@dbser11 (RS06) 6 92231_92626 175 oracle@dbser11 (CR00) 6 92300_92627 176 oracle@dbser11 (CR00) 6 92353_92628 177 oracle@dbser11 (CR00) 6 92334_92629 178 oracle@dbser11 (RS04) 6 92223_92630 179 oracle@dbser11 (CR00) 6 92325_92631 180 oracle@dbser11 (CR00) 6 92295_92632 181 oracle@dbser11 (CR00) 6 92346_92633 182 oracle@dbser11 (CR00) 6 92310_92634 183 oracle@dbser11 (CR00) 6 92267_92635 184 oracle@dbser11 (CR00) 6 92279_92636 185 oracle@dbser11 (CR00) 6 92317_92637 186 oracle@dbser11 (CR00) 6 92304_92638 187 oracle@dbser11 (CR00) 6 92342_92639 188 oracle@dbser11 (CR00) 6 92355_92640 189 oracle@dbser11 (CR00) 6 92314_92641 190 oracle@dbser11 (CR00) 6 92321_92642 191 oracle@dbser11 (CR00) 6 92348_92643 192 oracle@dbser11 (RS08) 6 92241_92644 193 oracle@dbser11 (RS09) 6 92245_92645 194 oracle@dbser11 (RS0A) 6 92251_92646 195 oracle@dbser11 (RS07) 6 92236_92647 196 oracle@dbser11 (RS0C) 6 92262_92648 197 oracle@dbser11 (RS0J) 6 92288_92650 198 oracle@dbser11 (RS0F) 6 92275_92651 200 oracle@dbser11 (RS0E) 6 92271_92654 201 oracle@dbser11 (RS0V) 6 92337_92655 202 oracle@dbser11 (RS0I) 6 92286_92656 203 oracle@dbser11 (RS0B) 6 92258_92657 204 oracle@dbser11 (RS0T) 6 92332_92658 205 oracle@dbser11 (RS0H) 6 92282_92661 206 oracle@dbser11 (RS0S) 6 92329_92662 207 oracle@dbser11 (RS0L) 6 92300_92665 208 oracle@dbser11 (RS0Z) 6 92353_92666 209 oracle@dbser11 (RS0U) 6 92334_92667 210 oracle@dbser11 (RS0R) 6 92325_92668 211 oracle@dbser11 (RS0K) 6 92295_92669 212 oracle@dbser11 (RS0X) 6 92346_92670 213 oracle@dbser11 (RS0N) 6 92310_92673 214 oracle@dbser11 (RS0D) 6 92267_92674 215 oracle@dbser11 (RS0G) 6 92279_92675 216 oracle@dbser11 (RS0P) 6 92317_92676 217 oracle@dbser11 (RS0M) 6 92304_92678 218 oracle@dbser11 (RS0W) 6 92342_92679 219 oracle@dbser11 (RS10) 6 92355_92680 220 oracle@dbser11 (RS0O) 6 92314_92687 221 oracle@dbser11 (RS0Q) 6 92321_92703 222 oracle@dbser11 (RS0Y) 6 92348_92706 224 oracle@dbser11 (IMR0) 6 92712 226 oracle@dbser11 (LCK0) 6 92855 229 oracle@dbser11 (CL00) E 124232 275 oracle@dbser11 (CL01) E 124256 277 oracle@dbser11 (CL02) E 124268 278 oracle@dbser11 (CL03) E 124280 281 oracle@dbser11 (CL04) E 124294 171 rows selected. SQL> |
当核心进程CL02、CL04异常后,实例肯定会crash,这是毋容置疑的。从CL04 进程的trace来看,本质上也是遭遇了heap error:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 |
Oracle Database 19c Enterprise Edition Release 19.0.0.0.0 - Production Version 19.14.0.0.0 Build label: RDBMS_19.14.0.0.0DBRU_LINUX.X64_211224.3 ORACLE_HOME: /u01/app/oracle/product/19.0.0/dbhome_1 System name: Linux Node name: xxxxx Release: 3.10.0-1160.el7.x86_64 Version: #1 SMP Tue Aug 18 14:50:17 EDT 2020 Machine: x86_64 Instance name: xxxx Redo thread mounted by this instance: 1 Oracle process number: 281 Unix process pid: 82221, image: oracle@xxxxx (CL04) ...... ...... [TOC00000] Jump to table of contents Dump continued from file: /u01/app/oracle/diag/rdbms/xxxxx/xxxxx1/trace/xxxxx1_cl04_82221.trc [TOC00001] ORA-00600: internal error code, arguments: [17112], [0x0DB4CCBC0], [], [], [], [], [], [], [], [], [], [] [TOC00001-END] [TOC00002] ========= Dump for incident 2726200 (ORA 600 [17112]) ======== [TOC00003] ----- Beginning of Customized Incident Dump(s) ----- ********** Internal heap ERROR 17112 addr=0xdb4ccbc0 ********* ***** Dump of memory around addr 0xdb4ccbc0: 0DB4CBBC0 20202020 20202020 20202020 20202020 [ ] Repeat 511 times Decoding of possible comments in or near previous range [0xdb4caa68] = 0x14edadd8 ==> kkqcscpfro:kglhd2 [0xdb4d2bf8] = 0x13f5fff0 ==> audRegFro:audtab |
进一步查看相关堆栈情况可以看到如下内容:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 |
Error Descriptor: ORA-600 [17112] [0x0DB4CCBC0] [] [] [] [] [] [] [] [] [] [] Error class: 0 Problem Key # of args: 1 Number of actions: 18 ----- Incident Context Dump ----- Address: 0x7f8997316de0 Incident ID: 2726200 Problem Key: ORA 600 [17112] Error: ORA-600 [17112] [0x0DB4CCBC0] [] [] [] [] [] [] [] [] [] [] [00]: dbgexExplicitEndInc [diag_dde] [01]: dbgeEndDDEInvocationImpl [diag_dde] [02]: kgherror_flag [KGH]<-- Signaling [03]: kgherror_quar_chk [KGH] [04]: kghfre [KGH] [05]: kghfrh_internal [KGH] [06]: kksFreeHeap [cursor] [07]: kksLockRecovery [cursor] [08]: kgxCleanup [] [09]: kksClearMutexSessionState [cursor] [10]: kksCleanSessionState [cursor] [11]: ksudlp_int [ksu] [12]: ksudlp [ksu] [13]: kss_del_cb [state_object] [14]: kssxdl [state_object] [15]: kssdel [state_object] [16]: ksuxdl [ksu] [17]: ksucln_dpc_cleanup [ksu] [18]: ksucln_dpc_dfs [ksu] [19]: ksucln_dpc_main [ksu] [20]: ksucln_slave_main [ksu] [21]: ksbdispatch [background_proc] [22]: opirip [OPI] [23]: opidrv [OPI] [24]: sou2o [] [25]: opimai_real [OPI] [26]: ssthrdmain [] [27]: main [] [28]: __libc_start_main [] MD [00]: 'SID'='6899.5088' (0x2) MD [01]: 'ProcId'='281.2' (0x2) MD [02]: 'Service'='SYS$BACKGROUND' (0x200) |
我们可以看到Oracle在调用khfre进行内存释放时发现了heap error,最终触发了此次问题。根据同事的反馈说之前测试过程中也出现了类似的问题,也出现过导致宕机的问题,报错一样。同时也定位是同一个业务逻辑。最后我建议应用将该业务逻辑中相关nologging hint全部去掉;后面在看似乎错误少了很多。至少alert log来看,没有类似的错误了。
针对上述的其中一个ora-00600错误,搜索Oracle MOS也可以看到类似的bug描述,供参考:
Bug 28276054 – Various ORA-600 / ORA-7445 Internal Errors Raised When Using PLSQL Reset Session (Doc ID 28276054.8)
不过该bug没有相关的解决方案,而且看上去说在19.7之后已经解决,然而我们这里是19.14。由此可见仍然存在类似的问题。
|
最后至于说该问题是不是Oracle Bug导致的crash,我认为可能性是极大的。
Leave a Reply
You must be logged in to post a comment.