lmon terminating the instance due to error 481
本站文章除注明转载外,均为本站原创: 转载自love wife love life —Roger的Oracle/MySQL/PostgreSQL数据恢复博客
今天某客户反馈说其中一套业务系统数据库实例crash重启了,通过分析了日志发现报错如下:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 |
Tue Oct 18 21:34:34 2022 Errors in file /u01/app/oracle/diag/rdbms/xxxx/xxxx1/trace/xxxx1_lmon_2491932.trc (incident=512156): ORA-00600: internal error code, arguments: [kghstack_underflow_internal_2], [0x1108E6388], [], [], [], [], [], [], [], [], [], [] Incident details in: /u01/app/oracle/diag/rdbms/xxxx/xxxx1/incident/incdir_512156/xxxx1_lmon_2491932_i512156.trc Tue Oct 18 21:34:44 2022 Dumping diagnostic data in directory=[cdmp_20221018213444], requested by (instance=1, osid=2491932 (LMON)), summary=[incident=512156]. Use ADRCI or Support Workbench to package the incident. See Note 411.1 at My Oracle Support for error and packaging details. Errors in file /u01/app/oracle/diag/rdbms/xxxx/xxxx1/trace/xxxx1_lmon_2491932.trc: ORA-00600: internal error code, arguments: [kghstack_underflow_internal_2], [0x1108E6388], [], [], [], [], [], [], [], [], [], [] LMON (ospid: 2491932): terminating the instance due to error 481 Tue Oct 18 21:34:44 2022 opiodr aborting process unknown ospid (7210544) as a result of ORA-1092 Tue Oct 18 21:34:44 2022 opiodr aborting process unknown ospid (5047352) as a result of ORA-1092 Tue Oct 18 21:34:44 2022 opiodr aborting process unknown ospid (10618076) as a result of ORA-1092 Tue Oct 18 21:34:44 2022 ORA-1092 : opitsk aborting process Tue Oct 18 21:34:44 2022 ORA-1092 : opitsk aborting process Tue Oct 18 21:34:44 2022 opiodr aborting process unknown ospid (5441064) as a result of ORA-1092 Tue Oct 18 21:34:44 2022 ORA-1092 : opitsk aborting process Tue Oct 18 21:34:49 2022 Instance terminated by LMON, pid = 2491932 Tue Oct 18 21:34:53 2022 Starting ORACLE instance (normal) |
可以看到实例被LMON进程给异常终止了,详细内容还需要进一步看lmon trace内容:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 |
*** SERVICE NAME:(SYS$BACKGROUND) 2022-10-18 21:34:34.668 *** MODULE NAME:() 2022-10-18 21:34:34.668 *** ACTION NAME:() 2022-10-18 21:34:34.668 Dump continued from file: /u01/app/oracle/diag/rdbms/xxxx/xxxx1/trace/xxxx1_lmon_2491932.trc ORA-00600: internal error code, arguments: [kghstack_underflow_internal_2], [0x1108E6388], [], [], [], [], [], [], [], [], [], [] ========= Dump for incident 512156 (ORA 600 [kghstack_underflow_internal_2]) ======== *** 2022-10-18 21:34:34.691 dbkedDefDump(): Starting incident default dumps (flags=0x2, level=3, mask=0x0) ----- SQL Statement (None) ----- Current SQL information unavailable - no cursor. ----- Call Stack Trace ----- calling call entry argument values in hex location type point (? means dubious value) -------------------- -------- -------------------- ---------------------------- skdstdst()+40 bl 0000000109B4CD24 000000000 ? 000000001 ? 000000003 ? 000000000 ? 000000000 ? 000000001 ? 000000003 ? 000000000 ? ksedst1()+112 call skdstdst() 171F2D30C8558AB1 ? 4844284100000000 ? FFFFFFFFFFF6500 ? 28E4DEBE4CBF3 ? 10A81AD8C ? 000000000 ? 11072A8C0 ? 2050033FFFF6508 ? ksedst()+40 call ksedst1() 000000000 ? 00000000A ? 000003000 ? 10A5BFFA8 ? 000000000 ? 000000000 ? 000002004 ? 000000001 ? dbkedDefDump()+1516 call ksedst() 000000000 ? 000000000 ? 000000000 ? 000000000 ? 000000000 ? 000000000 ? 000000000 ? 300000003 ? ksedmp()+72 call dbkedDefDump() 31072A8C0 ? 110000A60 ? FFFFFFFFFFF6D10 ? 1106AC1B8 ? 100125838 ? FFFFFFFFFFF7730 ? 1000F0D94 ? 1106AC1B8 ? ksfdmp()+100 call ksedmp() 000000002 ? 000000000 ? 000000002 ? 10AAE5CB0 ? 10A07CFD0 ? 000000000 ? 1109D3E30 ? 11072A8C0 ? dbgexPhaseII()+1904 call ksfdmp() 000000000 ? 00000000A ? 000000002 ? 000000000 ? 000000002 ? 10A07CFC8 ? 000000000 ? 001050005 ? dbgexProcessError() call dbgexPhaseII() 11072A8C0 ? 1109D2040 ? +1556 00007D09C ? 200000000 ? FFFFFFFFFFF7C28 ? 000000082 ? 000000000 ? 000000000 ? dbgeExecuteForError call dbgexProcessError() 11072A8C0 ? 1109D3E30 ? ()+72 1FFFFB6A0 ? 000000001 ? 000000703 ? 000000011 ? 000000006 ? 1109D5B78 ? dbgePostErrorKGE()+ call dbgeExecuteForError 000000000 ? 00A4D1050 ? 2044 () FFFFFFFFFFFFB210 ? 00A4D1050 ? 000000000 ? 90000000D6969D8 ? 000000000 ? 110000C58 ? dbkePostKGE_kgsf()+ call dbgePostErrorKGE() 000003000 ? 10A5BFFA8 ? 68 25800000002 ? 109E85570 ? 000000000 ? 000000000 ? FFFFFFFFFFFBEE0 ? 11113A600 ? kgeadse()+380 call dbkePostKGE_kgsf() 102DA1484 ? 100000000 ? FFFFFFFFFFFC0D8 ? 000000000 ? 110AED1A0 ? 1108EA610 ? 000000002 ? 700000000013680 ? kgerinv_internal()+ call kgeadse() 000000000 ? 000000000 ? 48 000000000 ? 1700000010 ? 100000000 ? 000003000 ? 110D33350 ? 1108EA610 ? kgerinv()+48 call kgerinv_internal() 8311AABF3BAF ? 8311AABF3FD4 ? 8311AABF3BAF ? 8311AABF3BAF ? 000000000 ? 10A5A3090 ? 000000000 ? 000000000 ? kgeasnmierr()+72 call kgerinv() 000000000 ? 000000023 ? 000000001 ? 000000004 ? 000000000 ? 000000001 ? 110D33350 ? 110AED398 ? kghstack_underflow_ call kgeasnmierr() 000000000 ? FFFFFFFFFFFC100 ? internal()+280 00000001E ? 100000001 ? 000000002 ? 1108E6388 ? 000000000 ? 000000000 ? kghstack_free()+716 call kghstack_underflow_ 000000001 ? 08DBD1E85 ? internal() 700011351BB7B48 ? 0000F4240 ? 000000000 ? 00000000A ? 000003000 ? 10A5BFFA8 ? kccgrd()+264 call kghstack_free() FFFFFFFFFFFC0C0 ? 4224282B00000000 ? 103D2C888 ? 000004000 ? 500000005 ? C0000000C ? 400003000 ? 10A5BFFA8 ? kjxgrf_rr_read()+66 call kccgrd() 1FFFD02FAFF35E5 ? 110A5BD70 ? 0 FFFFFFFFFFFC180 ? 000000000 ? 110A5BD70 ? 110FBCF48 ? 0037D6E50 ? 1106AC1B8 ? kjxgrDD_rr_read()+1 call kjxgrf_rr_read() 110A032D0 ? 700011342677E98 ? 04 000000000 ? 000000001 ? FFFFFFFFFFFC6A4 ? 110A03B38 ? FFFFFFFFFFFC630 ? 42245280FFFFC790 ? kjxgrimember()+124 call kjxgrDD_rr_read() 000003000 ? 10A5BFFA8 ? 000000002 ? 700000000013680 ? 11011EAD0 ? FFFFFFFFFFFCD80 ? 000000001 ? 218DBD1E85 ? kjxggpoll()+804 call kjxgrimember() FFFFFFFFFFFC6D0 ? 0000186A0 ? 101FECE90 ? 8311AABD8A40 ? 70000000000C0D0 ? 000000000 ? 000001568 ? 100000000 ? kjfmact()+508 call kjxggpoll() 000000000 ? 000000000 ? 000000000 ? 000000000 ? FFFFFFFFFFFC7A0 ? 000000000 ? 1037BB124 ? 000000000 ? kjfdact()+32 call kjfmact() 11011EAD0 ? FFFFFFFFFFFCD80 ? 000000001 ? 000000000 ? 002050000 ? 001160000 ? 10896F91E ? 14616E27FFFFC930 ? kjfcln()+2240 call kjfdact() 000000000 ? 10A5B3014 ? 700011351BB7B48 ? 000000002 ? 700011351BB7B54 ? 000000004 ? 2FFFFF570 ? 200000002 ? ksbrdp()+2216 call kjfcln() 700000000013198 ? 7000000000131B4 ? 048245028 ? 000000E00 ? 1108B2310 ? 100638128 ? 000000001 ? 700000007 ? opirip()+1620 call ksbrdp() FFFFFFFFFFFFEA7 ? 10B2ADCF0 ? FFFFFFFFFFFDE50 ? 000000000 ? 000000001 ? 000000000 ? 01099067F ? 000000001 ? opidrv()+608 call opirip() 10AE1FAC0 ? 410134198 ? FFFFFFFFFFFEFC0 ? 2F7530312F ? 108354684 ? 1106AC1B8 ? 7264626D732F6462 ? 1106AC1B8 ? sou2o()+136 call opidrv() 32067E1DB0 ? 4FFFFF388 ? FFFFFFFFFFFEFC0 ? 25001D022C0000 ? 000000010 ? 1106AC1B8 ? 000000000 ? 000000000 ? opimai_real()+188 call sou2o() FFFFFFFFFFFF030 ? 5524445B00000001 ? 9000000000DC64C ? BADC0FFEE0DDF00D ? 000000003 ? 9001000A008DB98 ? A0000000A000000 ? 10B6B6F40 ? ssthrdmain()+276 call opimai_real() FFFFFFFFFFFF110 ? 9001000A0092DC0 ? FFFFFFFFFFFF130 ? 10B6F72B8 ? 90000000008AB0C ? 9001000A008DB98 ? FFFFFFFFFFFF110 ? 9001000A008DB98 ? main()+204 call ssthrdmain() 3F0003720 ? FFFFFFFFFFFF478 ? FFFFFFFFFFFF4E0 ? 9FFFFFFF000D6F0 ? 9FFFFFFF00009E0 ? 000000000 ? 000000000 ? 9FFFFFFF000D6F0 ? __start()+112 call main() 000000000 ? 000000000 ? 000000000 ? 000000000 ? 000000000 ? 000000000 ? 000000000 ? 000000000 ? |
跟进前面的call stack信息,很容易定位到如下的bug,详细内容可以参考mos的文章:
SYMPTOMS
- The LMON or LMS process crash the instance with an error like:
ORA-00600: internal error code, arguments: [kghstack_underflow_internal_2], [0x110A10838], [], [], [], [], [], [], [], [], [], []
ORA-1092 : opitsk aborting process
Instance terminated by LMS1, pid = 14024818 - Review of the generated tracefiles reveals a call stack similar to:
… kghstack_underflow_internal kghstack_free kccgrd kjxgrf_rr_read kjxgrDD_rr_read kjxgrimember kjxggpoll kjfmact kjfdact kjfcln ksbrdp …
– OR –
… kghstack_underflow_internal kghstack_free ktundo kturcrbackoutonechg ktrgcm ktrget3 ktrget2 kclgcr …
CHANGES
CAUSE
The cause of this problem has been identified in a.o.:
Bug 18687067 – ORA-600 [KGHSTACK_UNDERFLOW_INTERNAL_2]
closed as duplicate of Bug 20675347 – ORA-07445 [KGHSTACK_OVERFLOW_INTERNAL()+644]
The bug is caused by an AIX compiler issue causing volatile variables in the Oracle kernel not to be handled properly.
The bug is a regression introduced in 11.2.0.4.
The issue does not reproduce in later versions, i.e. 12.1.
SOLUTION
To solve the issue, use any of below alternatives:
- Upgrade to 12.1
– OR –
- Apply interim patch 20675347, if available for your platform and Oracle version.
To check for conflicting patches, please use the MOS Patch Planner Tool
Please refer to
Note 1317012.1 – How To Use MOS Patch Planner To Check And Request The Conflict Patches?If no patch exists for your version, please contact Oracle Support for a backport request.
从文档来看,该问题在11.2.0.4还是比较常见,主要是该用户没有安装相应的PSU。问题相对简单,简单记录一下,以备后查!
Leave a Reply
You must be logged in to post a comment.