绑定变量超过65535 导致数据库集群频繁crash
本站文章除注明转载外,均为本站原创: 转载自love wife love life —Roger的Oracle/MySQL/PostgreSQL数据恢复博客
本文链接地址: 绑定变量超过65535 导致数据库集群频繁crash
今天凌晨5点半,被客户的电话声惊喜。对于DBA来讲,最紧张的时刻莫过于半夜接到突然突然。若非紧急故障,客户通常不会大半夜打电话。从客户提供的信息来看,数据库不断crash:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 |
Thu Aug 25 00:05:59 2022 Exception [type: SIGSEGV, SI_KERNEL(general_protection)] [ADDR:0x0] [PC:0x186D3B7, opiaba()+639] [flags: 0x0, count: 1] Errors in file /u02/app/oracle/diag/rdbms/gdfw/gdfw1/trace/gdfw1_ora_93981.trc (incident=590479): ORA-07445: exception encountered: core dump [opiaba()+639] [SIGSEGV] [ADDR:0x0] [PC:0x186D3B7] [SI_KERNEL(general_protection)] [] Incident details in: /u02/app/oracle/diag/rdbms/xxxx/xxxx1/incident/incdir_590479/xxxx1_ora_93981_i590479.trc Use ADRCI or Support Workbench to package the incident. See Note 411.1 at My Oracle Support for error and packaging details. Thu Aug 25 00:06:17 2022 Dumping diagnostic data in directory=[cdmp_20220825000617], requested by (instance=1, osid=93981), summary=[incident=590479]. Thu Aug 25 00:06:19 2022 Sweep [inc][590479]: completed Sweep [inc2][590479]: completed Thu Aug 25 00:06:33 2022 Errors in file /u02/app/oracle/diag/rdbms/xxxx/xxxx1/trace/xxxx1_pmon_94310.trc (incident=587614): ORA-00600: internal error code, arguments: [17147], [0x2223138B60], [], [], [], [], [], [], [], [], [], [] Incident details in: /u02/app/oracle/diag/rdbms/xxxx/xxxx1/incident/incdir_587614/xxxx1_pmon_94310_i587614.trc Use ADRCI or Support Workbench to package the incident. See Note 411.1 at My Oracle Support for error and packaging details. Thu Aug 25 00:06:36 2022 Dumping diagnostic data in directory=[cdmp_20220825000636], requested by (instance=1, osid=94310 (PMON)), summary=[incident=587614]. Errors in file /u02/app/oracle/diag/rdbms/xxxx/xxxx1/trace/xxxx1_pmon_94310.trc: ORA-00600: internal error code, arguments: [17147], [0x2223138B60], [], [], [], [], [], [], [], [], [], [] PMON (ospid: 94310): terminating the instance due to error 472 Thu Aug 25 00:06:37 2022 ORA-1092 : opitsk aborting process System state dump requested by (instance=1, osid=94310 (PMON)), summary=[abnormal instance termination]. System State dumped to trace file /u02/app/oracle/diag/rdbms/xxxx/xxxx1/trace/xxxx1_diag_94335_20220825000641.trc Instance terminated by PMON, pid = 94310 Thu Aug 25 00:06:48 2022 Adjusting the default value of parameter parallel_max_servers from 3600 to 1970 due to the value of parameter processes (2000) Starting ORACLE instance (normal) ************************ Large Pages Information ******************* Per process system memlock (soft) limit = UNLIMITED |
可以看到实例最终被pmon进程异常终止了。而ora-00600 17147 也是较为常见的。
进一步分析客户提供的trace 发现了如下内容,其中报错的存储过程传参超过7万个:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 |
*** 2022-08-25 00:05:59.450 dbkedDefDump(): Starting a non-incident diagnostic dump (flags=0x3, level=3, mask=0x0) ----- Current SQL Statement for this session (sql_id=dz3895jnbahxj) ----- BEGIN INSERT INTO PSSC.T_H_SG186FAULT_EVENT ( ID, CITY_UNIT_NAME, CITY_UNIT_ID, ........ SYS_GUID(), :1 , :2 , :3 , :4 , :5 , :6 , :7 , :78693 , :78694 , TO_DATE(:78695 ,'YYYY-MM-DD HH24:MI:SS'), :78696 , :78697 , TO_DATE(:78698 ,'YYYY-MM-DD HH24:MI:SS'), :78699 , :78700 , :78701 , :78702 , :78703 , :78704 , :78705 ); COMMIT; END; ----- Call Stack Trace ----- calling call entry argument values in hex location type point (? means dubious value) -------------------- -------- -------------------- ---------------------------- skdstdst()+41 call kgdsdst() 000000000 ? 000000000 ? 7F046159BC40 ? 7F046159BD18 ? 7F04615A07C0 ? 000000003 ? ksedst1()+103 call skdstdst() 000000000 ? 000000000 ? 7F046159BC40 ? 7F046159BD18 ? 7F04615A07C0 ? 000000003 ? ksedst()+39 call ksedst1() 000000001 ? 000000001 ? 7F046159BC40 ? 7F046159BD18 ? 7F04615A07C0 ? 000000003 ? dbkedDefDump()+2746 call ksedst() 000000001 ? 000000001 ? 7F046159BC40 ? 7F046159BD18 ? 7F04615A07C0 ? 000000003 ? ksedmp()+41 call dbkedDefDump() 000000003 ? 000000003 ? 7F046159BC40 ? 7F046159BD18 ? 7F04615A07C0 ? 000000003 ? ssexhd()+2462 call ksedmp() 000000003 ? 000000003 ? 7F046159BC40 ? 7F046159BD18 ? 7F04615A07C0 ? 000000003 ? __sighandler() call ssexhd() 00000000B ? 7F04615A1BF0 ? 7F04615A1AE8 ? 7F046159BD18 ? 7F04615A07C0 ? 000000003 ? opiaba()+639 signal __sighandler() 000003380 ? 000003380 ? 000000000 ? 000003200 ? 222311EFD8 ? 0000000A1 ? opiprs()+307 call opiaba() 000003380 ? 2000000226B1F0F ? 000000000 ? 000003200 ? 222311EFD8 ? 0000000A1 ? kksParseChildCursor call opiprs() 7FFC39C77780 ? 7F045E3BC3E8 ? ()+572 00042860A ? 21CC3DE210 ? 000000004 ? 0000000A1 ? rpiswu2()+1776 call kksParseChildCursor 7FFC39C77960 ? 7F045E3BC3E8 ? () 00042860A ? 21CC3DE210 ? 000000004 ? 0000000A1 ? kksLoadChild()+5255 call rpiswu2() 2602464080 ? 000000027 ? 2263B6CC34 ? 000000017 ? 000000004 ? 0000000A1 ? kxsGetRuntimeLock() call kksLoadChild() 00C113DA0 ? 2736366FA8 ? +2209 7FFC39C78EB0 ? 000000017 ? 000000004 ? 0000000A1 ? kksfbc()+15108 call kxsGetRuntimeLock() 00C113DA0 ? 7F0461397608 ? 7FFC39C78EB0 ? 000000017 ? 000000004 ? 0000000A1 ? kkspsc0()+1171 call kksfbc() 7F0461397608 ? 000000003 ? 000000108 ? 7F045E3BC3E8 ? 000000004 ? 0000000A1 ? kksParseCursor()+11 call kkspsc0() 7F04613826D8 ? 7F045E3BC3E8 ? 6 00042860A ? 000000003 ? 000000006 ? 0000000A4 ? opiosq0()+2019 call kksParseCursor() 7FFC39C79A20 ? 7F045E3BC3E8 ? 00042860A ? 000000003 ? 000000006 ? 0000000A4 ? kpooprx()+274 call opiosq0() 000000003 ? 00000000E ? 7FFC39C79BF0 ? 0000000A4 ? 000000006 ? 0000000A4 ? kpoal8()+842 call kpooprx() 7FFC39C7D3D4 ? 7F045E3BC3E8 ? 000428609 ? 000000001 ? 000000000 ? 0000000A4 ? opiodr()+917 call kpoal8() 00000005E ? 7F045E3BC3E8 ? 000428609 ? 000000001 ? 000000000 ? 0000000A4 ? ttcpip()+2183 call opiodr() 00000005E ? 00000001C ? 7FFC39C7D3D0 ? 000000001 ? 000000000 ? 0000000A4 ? opitsk()+1710 call ttcpip() 00C1319F0 ? 0099D5D10 ? 7FFC39C7D3D0 ? 000000000 ? 7FFC39C7CE28 ? 7FFC39C7D3CC ? opiino()+969 call opitsk() 00C1319F8 ? 000000000 ? 7FFC39C7D3D0 ? 000000000 ? 7FFC39C7CE28 ? 7FFC39C7D3CC ? opiodr()+917 call opiino() 00000003C ? 000000004 ? 7FFC39C7EBC8 ? 000000000 ? 7FFC39C7CE28 ? 7FFC39C7D3CC ? opidrv()+570 call opiodr() 00000003C ? 000000004 ? 7FFC39C7EBC8 ? 000000000 ? 7FFC39C7CE28 ? 7FFC39C7D3CC ? sou2o()+103 call opidrv() 00000003C ? 000000004 ? 7FFC39C7EBC8 ? 000000000 ? 7FFC39C7CE28 ? 7FFC39C7D3CC ? opimai_real()+133 call sou2o() 7FFC39C7EBA0 ? 00000003C ? 000000004 ? 7FFC39C7EBC8 ? 7FFC39C7CE28 ? 7FFC39C7D3CC ? ssthrdmain()+265 call opimai_real() 000000002 ? 7FFC39C7ED90 ? 000000004 ? 7FFC39C7EBC8 ? 7FFC39C7CE28 ? 7FFC39C7D3CC ? main()+201 call ssthrdmain() 000000002 ? 7FFC39C7ED90 ? 000000001 ? 000000000 ? 7FFC39C7CE28 ? 7FFC39C7D3CC ? __libc_start_main() call main() 000000002 ? 7FFC39C7EF38 ? +253 000000001 ? 000000000 ? 7FFC39C7CE28 ? 7FFC39C7D3CC ? _start()+41 call __libc_start_main() 000A2A354 ? 000000002 ? 7FFC39C7EF28 ? 000000000 ? 7FFC39C7CE28 ? 7FFC39C7D3CC ? |
对比上述相关错误以及call stack堆栈信息,发现为Oracle bug导致。可参考如下文档:
摘取其中一段描述:
CAUSE
Instance terminated due to ora-7445 [opiaba] which leads to ora-600 [17147]. ora-7445 [opiaba] error is reported due to the use of more than 65535 binds in the same sql / plsql statement.
You may find some or all of the following function codes in the ‘Call Stack’ portion of the trace file:
opiaba opiprs rpiswu2 kksLoadChild kxsGetRuntimeLock kksfbc
This scenario is reported in bug 13973845 which is closed of duplicated bug 12578873.
紧急下载该patch发客户,打上补丁之后,数据库实例不再频繁crash。从描述来看,该patch并不能彻底解决问题,只是让实例不crash而已。最根本的解决方法还是调整应用,将存储过程拆分,不要一次性传入数万个参数,超过oracle 65535的临界值。
这个问题很简答, 我们在其他客户之前都遇到过了多次,在我司内部mes平台也能看到相关的案例分享。简单记录一下吧!
Leave a Reply
You must be logged in to post a comment.