PMON crash the instance of RAC
本站文章除注明转载外,均为本站原创: 转载自love wife love life —Roger的Oracle/MySQL/PostgreSQL数据恢复博客
本文链接地址: PMON crash the instance of RAC
这是某个网友的问题,其一套rac的某个节点被重启了,通过分析日志我们可以看到其中一个rac节点crash了,如下:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 |
Errors in file /opt/oracle/diag/rdbms/cspora/cspora2/trace/cspora2_pmon_57410148.trc (incident=584434): ORA-00600: internal error code, arguments: [kjucvl:!busy], [8], [], [], [], [], [], [], [], [], [], [] Incident details in: /opt/oracle/diag/rdbms/cspora/cspora2/incident/incdir_584434/cspora2_pmon_57410148_i584434.trc Thu Feb 05 10:24:04 2015 Trace dumping is performing id=[cdmp_20150205102404] Errors in file /opt/oracle/diag/rdbms/cspora/cspora2/trace/cspora2_pmon_57410148.trc (incident=584435): ORA-00600: internal error code, arguments: [kjuscv], [6], [], [], [], [], [], [], [], [], [], [] ORA-00600: internal error code, arguments: [kjucvl:!busy], [8], [], [], [], [], [], [], [], [], [], [] Incident details in: /opt/oracle/diag/rdbms/cspora/cspora2/incident/incdir_584435/cspora2_pmon_57410148_i584435.trc Thu Feb 05 10:24:06 2015 Sweep Incident[584434]: completed Errors in file /opt/oracle/diag/rdbms/cspora/cspora2/trace/cspora2_pmon_57410148.trc: ORA-00600: internal error code, arguments: [kjuscv], [6], [], [], [], [], [], [], [], [], [], [] ORA-00600: internal error code, arguments: [kjucvl:!busy], [8], [], [], [], [], [], [], [], [], [], [] PMON (ospid: 57410148): terminating the instance due to error 472 Thu Feb 05 10:24:18 2015 Termination issued to instance processes. Waiting for the processes to exit Instance termination failed to kill one or more processes Instance terminated by PMON, pid = 57410148 Thu Feb 05 10:24:58 2015 Starting ORACLE instance (normal) sskgpgetexecname failed to get name LICENSE_MAX_SESSION = 0 LICENSE_SESSIONS_WARNING = 0 Picked latch-free SCN scheme 3 Autotune of undo retention is turned on. LICENSE_MAX_USERS = 0 SYS auditing is disabled Starting up ORACLE RDBMS Version: 11.1.0.7.0. Using parameter settings in server-side spfile /dev/rlv_spfile ....... ....... ORA-1154 signalled during: alter database open... Thu Feb 05 10:31:11 2015 Reconfiguration started (old inc 102, new inc 104) List of nodes: 1 Global Resource Directory frozen * dead instance detected - domain 0 invalid = TRUE Communication channels reestablished Master broadcasted resource hash value bitmaps Non-local Process blocks cleaned out Thu Feb 05 10:31:12 2015 LMS 0: 0 GCS shadows cancelled, 0 closed, 0 Xw survived Thu Feb 05 10:31:12 2015 |
从上述alert log可以看出,数据库实例是被pmon进程强行终止的,之所以pmon进程会强行终止instance,
很明显之前出现了ORA-00600错误。很明显上述ORA-00600错误是pmon进程出现的。我们来分析pmon进程的trace内容:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 |
*** ACTION NAME:() 2015-02-05 10:24:01.185 Dump continued from file: /opt/oracle/diag/rdbms/cspora/cspora2/trace/cspora2_pmon_57410148.trc ORA-00600: internal error code, arguments: [kjucvl:!busy], [8], [], [], [], [], [], [], [], [], [], [] ========= Dump for incident 584434 (ORA 600 [kjucvl:!busy]) ======== ----- Beginning of Customized Incident Dump(s) ----- ----------enqueue 7000001cbbcdaf0------------------------ lock version : 311947 Owner node : 1 grant_level : KJUSEREX req_level : KJUSEREX bast_level : KJUSERNL notify_func : 0 resp : 70000018f124ac0 procp : 7000001cb0e3960 pid : 46924312 proc version : 922 oprocp : 0 opid : 0 group lock owner : 0 xid : 0000-0000-00000000 dd_time : 0.0 secs dd_count : 0 timeout : 0.0 secs On_timer_q? : N On_dd_q? : N lock_state : CONVERTING Open Options : KJUSERPROCESS_OWNED Convert options : KJUSERGETVALUE KJUSERNOQUEUE KJUSEREXPRESS KJUSERNODEADLOCKWAIT KJUSERNODEADLOCKBLOCK History : 0x955252b5 Msg_Seq : 0x0 res_seq : 41 valblk : 0x514e5844970d5d33d156c8d80004ac1e QNXD]3V ----------resource 70000018f124ac0---------------------- resname : [0x970d5d33][0x0],[DX] Local node : 1 dir_node : 0 master_node : 0 hv idx : 18 hv last r.inc : 98 current inc : 98 hv status : 0 hv master : 0 open options : Held mode : KJUSEREX Cvt mode : KJUSERNL Next Cvt mode : KJUSERNL msg_seq : 1 res_seq : 41 grant_bits : KJUSERNL KJUSEREX grant mode : KJUSERNL KJUSERCR KJUSERCW KJUSERPR KJUSERPW KJUSEREX count : 1 0 0 0 0 1 val_state : KJUSERVS_VALUE valblk : 0x514e5844970d5d33d156c8d80004ac1e QNXD]3V access_node : 1 vbreq_state : 0 state : x8 resp : 70000018f124ac0 On Scan_q? : N Total accesses: 5475 Imm. accesses: 5057 Granted_locks : 2 Cvting_locks : 0 value_block: 51 4e 58 44 97 0d 5d 33 d1 56 c8 d8 00 04 ac 1e GRANTED_Q : lp 7000001cbbcdaf0 gl KJUSEREX rp 70000018f124ac0 [0x970d5d33][0x0],[DX] master 0 pid 46924312 bast 0 rseq 41 mseq 0 history 0x955252b5 open opt KJUSERPROCESS_OWNED CONVERT_Q: ----- End of Customized Incident Dump(s) ----- *** 2015-02-05 10:24:01.286 ----- SQL Statement (None) ----- Current SQL information unavailable - no cursor. ----- Call Stack Trace ----- calling call entry argument values in hex location type point (? means dubious value) -------------------- -------- -------------------- ---------------------------- skdstdst()+002c bl 105e6a628 ksedst1()+0064 bl 101f9c14c ksedst()+0028 bl ksedst1() FFFFFFFFFFF4880 ? 000000000 ? dbkedDefDump()+0874 bl ksedst() 110EA2180 ? ksedmp()+0048 bl dbkedDefDump() 3FFFF51D0 ? 110000A90 ? ksfdmp()+0058 bl ksedmp() 000000020 ? dbgexPhaseII()+0130 bl _ptrgl() dbgexExplicitEndInc bl dbgexPhaseII() 000000000 ? 000000000 ? ()+0210 000000000 ? dbgeEndDDEInvocatio bl dbgexExplicitEndInc 110378410 ? 11037F370 ? nImpl()+0224 () dbgeEndDDEInvocatio bl dbgeEndDDEInvocatio 110378410 ? 11037F370 ? n()+0038 nImpl() kjucvl()+0590 bl 101f9d824 kjuscv()+0c90 bl kjucvl() 000000002 ? FFFFFFFFFFF9700 ? FFFFFFFFFFF9570 ? 24244228CB0E3960 ? 102196C80 ? FFFFFFFFFFF9220 ? 000000000 ? 110323C28 ? ksicon_v()+06cc bl kjuscv() 7000001CBBCDAF0 ? 500000000000000 ? 79FFFF9D00 ? 0105358E0 ? 7000001BDF32FB4 ? 000000000 ? 000000000 ? 000000000 ? k2qsod()+0250 bl ksicon_v() 1071DED20 ? 000000000 ? 000000000 ? 084242240 ? 70000017FEF01A0 ? 000000046 ? 70000017FEF01C0 ? kssxdl()+076c bl _ptrgl() kssdel()+0048 bl kssxdl() 7000001BDF32F38 ? 382242228 ? kssdch()+0d30 bl kssdel() FFFFFFFFFFFA550 ? 4224822800000003 ? ksuxds()+08b8 bl kssdch() 3CD1FEAE8 ? 7000001CD3ACCA0 ? kssxdl()+076c bl _ptrgl() kssdel()+0048 bl kssxdl() 7000001CD3ACCA0 ? 3CE4C7B78 ? kssdch()+0d30 bl kssdel() FFFFFFFFFFFF5D0 ? FFFFFFFFFFFF5E0 ? ksudlp()+0180 bl kssdch() 000000000 ? 000000000 ? kssxdl()+076c bl _ptrgl() kssdel()+0048 bl kssxdl() 7000001CF9B5E48 ? 300000000 ? ksuxdl()+029c bl kssdel() 033339530 ? 093339530 ? ksuxda()+02ac bl ksuxdl() 7000001CF9B5E48 ? 0FFFFC1D0 ? ksucln()+0938 bl ksuxda() ksbrdp()+075c bl _ptrgl() opirip()+0444 bl 105e65300 opidrv()+0418 bl opirip() 732F6373706F7261 ? 410323C28 ? FFFFFFFFFFFF420 ? sou2o()+0090 bl opidrv() 3204C0756C ? 4A0071E60 ? FFFFFFFFFFFF420 ? opimai_real()+0148 bl 101f9be30 main()+0090 bl opimai_real() 000000000 ? 000000000 ? __start()+0070 bl main() 000000000 ? 000000000 ? |
从上述信息来看,出现了DX锁,对于DX锁,这是一种分布式事务锁。通过分析call stack,我们可以定位到
该问题其实是一个Oracle bug:11868640。
通过对比Bug 11868640 – ORA-600 [kjucvl:!busy] possible in PMON in RAC (文档 ID 11868640.8) ,
我们可以发现pmon trace中的call stack与上述bug描述完全一致。实际上该问题,可以通过安装patch或psu来解决。
很简单的一个问题,供大家参考!
Leave a Reply
You must be logged in to post a comment.