The cause of system hung is cursor: pin X ?

本站文章除注明转载外，均为本站原创： 转载自love wife love life —Roger的Oracle/MySQL/PostgreSQL数据恢复博客

本文链接地址: The cause of system hung is cursor: pin X ?

这是一个朋友单位的库，据说该库hung了一段时间，从alert log来看，中间有2小时是没有任何信息，难道hung了2个小时吗？ alert log 如下：

Tue Oct 28 15:23:21 2014
kkjcre1p: unable to spawn jobq slave process 
Tue Oct 28 15:23:21 2014
Errors in file /u01/app/oracle/admin/xxxx/bdump/xxxx_cjq0_9568296.trc:
Tue Oct 28 17:22:04 2014
System State dumped to trace file 
Tue Oct 28 17:25:21 2014
LGWR: terminating instance due to error 472
Instance terminated by LGWR, pid = 10748094
Tue Oct 28 17:26:41 2014
Starting ORACLE instance (normal)
sskgpgetexecname failed to get name
LICENSE_MAX_SESSION = 0
LICENSE_SESSIONS_WARNING = 0
Picked latch-free SCN scheme 3
Using LOG_ARCHIVE_DEST_10 parameter default value as USE_DB_RECOVERY_FILE_DEST
Autotune of undo retention is turned on.

Tue Oct 28 15:23:21 2014

kkjcre1p: unable to spawn jobq slave process

Tue Oct 28 15:23:21 2014

Errors in file /u01/app/oracle/admin/xxxx/bdump/xxxx_cjq0_9568296.trc:

Tue Oct 28 17:22:04 2014

System State dumped to trace file

Tue Oct 28 17:25:21 2014

LGWR: terminating instance due to error 472

Instance terminated by LGWR, pid = 10748094

Tue Oct 28 17:26:41 2014

Starting ORACLE instance (normal)

sskgpgetexecname failed to get name

LICENSE_MAX_SESSION = 0

LICENSE_SESSIONS_WARNING = 0

Picked latch-free SCN scheme 3

Using LOG_ARCHIVE_DEST_10 parameter default value as USE_DB_RECOVERY_FILE_DEST

Autotune of undo retention is turned on.

上述的信息是很简单的，提到了system state dump，下面我们来看下朋友提供的dump. 首先利用脚本先格式化一下：

Starting Systemstate 1
......................................................
Ass.Awk Version 1.0.38
~~~~~~~~~~~~~~~~~~~~~~
Source file : xxxx_ora_33030248.trc
System State 1  (2014-10-28 17:22:04.803)
~~~~~~~~~~~~~~   ~~~~~~~~~~~~~~~~~~~~~~~
1:                                      [DEAD] 
2:  waiting for 'cursor: pin X'        [Mutex ad39e34] seq=37912 
3:  waiting for 'rdbms ipc message'     seq=14319 
4:  waiting for 'rdbms ipc message'     seq=1742 
5:  waiting for 'rdbms ipc message'     seq=36300 
6:  waiting for 'rdbms ipc message'     seq=11154 
7:  waiting for 'rdbms ipc message'     seq=40518 
8:  waiting for 'latch: library cache' [Latch 70000006b9d8008] seq=26959 
9:  waiting for 'rdbms ipc message'     seq=1897 
10: waiting for 'latch: library cache' [Latch 70000006b9d8008] seq=46586 
11: waiting for 'latch: library cache' [Latch 70000006b9d8008] seq=42779 
12: waiting for 'rdbms ipc message'     seq=1704 
13:                                     
14:                                     
15: waiting for 'Streams AQ: qmn slave idle wait' seq=60 
16: waiting for 'latch: library cache' [Latch 70000006b9d8008] seq=23668 
17: waiting for 'SQL*Net message from client' seq=33 
18: waiting for 'latch: library cache' [Latch 70000006b9d8008] seq=25574 
Cmd: Insert
19: last wait for 'latch: library cache' [DEAD] 
20: waiting for 'Streams AQ: qmn coordinator idle wait' seq=7 
........
24: waiting for 'latch: library cache' [Latch 70000006b9d8008] seq=20357 
25: waiting for 'Streams AQ: waiting for time management or cleanup tasks' seq=2929 
26: waiting for 'latch: library cache' [Latch 70000006b9d8008] seq=23830 
Cmd: Select
27: waiting for 'latch: library cache' [Latch 70000006b9d8008] seq=18517 
.......
31: waiting for 'latch: library cache' [Latch 70000006b9d8008] seq=16015 
32: last wait for 'latch: library cache'[Latch 70000006b9d8008] 
........
54: waiting for 'latch: library cache' [Latch 70000006b9d8008] seq=408 
Blockers
~~~~~~~~
Above is a list of all the processes. If they are waiting for a resource
then it will be given in square brackets. Below is a summary of the
waited upon resources, together with the holder of that resource.
Notes:
~~~~~
o A process id of '???' implies that the holder was not found in the
systemstate. (The holder may have released the resource before we
dumped the state object tree of the blocking process).
o Lines with 'Enqueue conversion' below can be ignored *unless* 
other sessions are waiting on that resource too. For more, see 
http://dlsunuk11.uk.oracle.com/Public/TOOLS/Ass.html#enqcnv)
Resource Holder State
Mutex ad39e34    19: last wait for 'latch: library cache'
Latch 70000006b9d8008    19: last wait for 'latch: library cache'
Latch 70000006b9d8008    32: 32: is waiting for 19: 32:
Some of the above latches may be child latches. Please check the section
named 'Child Latch Report' below for further notes.
Blockers According to Tracefile Wait Info:
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
1. This may not work for 64bit platforms. See bug 2902997 for details.
2. If the blocking process is shown as 0 then that session may no longer be
present.
3. If resources are held across code layers then sometimes the tracefile wait
info will not recognise the problem.
No blockers seen.
Object Names
~~~~~~~~~~~~
Mutex ad39e34                                                 
Latch 70000006b9d8008           Child library cache           
Child Latch Report
~~~~~~~~~~~~~~~~~~
Some processes are being blocked waiting for child latches.
At the moment this script does not detect the blocker because the
child latch address differs to the parent latch address. To manually
detect the blocker please take the following steps :
1. Determine the TYPE of latch (Eg library cache) that is involved.
2. Search the source trace file for a target of :
holding.*Parent.*library cache
(Assuming we have a child library cache and have vi-like regular expressions)
If this shows nothing then the blocker may have released the resource
before we got to dump the state object tree of the blocked process.
A list of processes that hold parent latches is given below :
No processes found.
Latch Wait List Information
~~~~~~~~~~~~~~~~~~~~~~~~~~~
Pid 8 is blocked waiting for latch 'Child library cache' with 38 waiters.
Possible holder is Pid 10027060.
.......
Pid 54 is blocked waiting for latch Child library cache with 38 waiters.
Possible holder is Pid 10027060.
39 entries seen
Summary of Wait Events Seen (count&gt;10)
~~~~~~~~~~~~~~~~~~~~~~~~~~~
37 : 'latch: library cache'
------------------ooOoo------------------
For the LATEST version of this utility see
http://dlsunuk11.uk.oracle.com/Public/Utils.html#ass
For additional documentation see
http://dlsunuk11.uk.oracle.com/Public/TOOLS/Ass.html
Suggested improvements, bugs etc. should be sent to kevin.p.quinn@oracle.com
End of report. 18605 Lines Processed.

100

101

102

103

104

105

106

107

108

109

110

111

112

113

114

115

116

117

118

119

120

121

Starting Systemstate 1

......................................................

Ass.Awk Version 1.0.38

~~~~~~~~~~~~~~~~~~~~~~

Source file : xxxx_ora_33030248.trc

System State 1 (2014-10-28 17:22:04.803)

~~~~~~~~~~~~~~ ~~~~~~~~~~~~~~~~~~~~~~~

1: [DEAD]

2: waiting for 'cursor: pin X' [Mutex ad39e34] seq=37912

3: waiting for 'rdbms ipc message' seq=14319

4: waiting for 'rdbms ipc message' seq=1742

5: waiting for 'rdbms ipc message' seq=36300

6: waiting for 'rdbms ipc message' seq=11154

7: waiting for 'rdbms ipc message' seq=40518

8: waiting for 'latch: library cache' [Latch 70000006b9d8008] seq=26959

9: waiting for 'rdbms ipc message' seq=1897

10: waiting for 'latch: library cache' [Latch 70000006b9d8008] seq=46586

11: waiting for 'latch: library cache' [Latch 70000006b9d8008] seq=42779

12: waiting for 'rdbms ipc message' seq=1704

13:

14:

15: waiting for 'Streams AQ: qmn slave idle wait' seq=60

16: waiting for 'latch: library cache' [Latch 70000006b9d8008] seq=23668

17: waiting for 'SQL*Net message from client' seq=33

18: waiting for 'latch: library cache' [Latch 70000006b9d8008] seq=25574

Cmd: Insert

19: last wait for 'latch: library cache' [DEAD]

20: waiting for 'Streams AQ: qmn coordinator idle wait' seq=7

........

24: waiting for 'latch: library cache' [Latch 70000006b9d8008] seq=20357

25: waiting for 'Streams AQ: waiting for time management or cleanup tasks' seq=2929

26: waiting for 'latch: library cache' [Latch 70000006b9d8008] seq=23830

Cmd: Select

27: waiting for 'latch: library cache' [Latch 70000006b9d8008] seq=18517

.......

31: waiting for 'latch: library cache' [Latch 70000006b9d8008] seq=16015

32: last wait for 'latch: library cache'[Latch 70000006b9d8008]

........

54: waiting for 'latch: library cache' [Latch 70000006b9d8008] seq=408

Blockers

~~~~~~~~

Above is a list of all the processes. If they are waiting for a resource

then it will be given in square brackets. Below is a summary of the

waited upon resources, together with the holder of that resource.

Notes:

~~~~~

o A process id of '???' implies that the holder was not found in the

systemstate. (The holder may have released the resource before we

dumped the state object tree of the blocking process).

o Lines with 'Enqueue conversion' below can be ignored *unless*

other sessions are waiting on that resource too. For more, see

http://dlsunuk11.uk.oracle.com/Public/TOOLS/Ass.html#enqcnv)

Resource Holder State

Mutex ad39e34 19: last wait for 'latch: library cache'

Latch 70000006b9d8008 19: last wait for 'latch: library cache'

Latch 70000006b9d8008 32: 32: is waiting for 19: 32:

Some of the above latches may be child latches. Please check the section

named 'Child Latch Report' below for further notes.

Blockers According to Tracefile Wait Info:

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

1. This may not work for 64bit platforms. See bug 2902997 for details.

2. If the blocking process is shown as 0 then that session may no longer be

present.

3. If resources are held across code layers then sometimes the tracefile wait

info will not recognise the problem.

No blockers seen.

Object Names

~~~~~~~~~~~~

Mutex ad39e34

Latch 70000006b9d8008 Child library cache

Child Latch Report

~~~~~~~~~~~~~~~~~~

Some processes are being blocked waiting for child latches.

At the moment this script does not detect the blocker because the

child latch address differs to the parent latch address. To manually

detect the blocker please take the following steps :

1. Determine the TYPE of latch (Eg library cache) that is involved.

2. Search the source trace file for a target of :

holding.*Parent.*library cache

(Assuming we have a child library cache and have vi-like regular expressions)

If this shows nothing then the blocker may have released the resource

before we got to dump the state object tree of the blocked process.

A list of processes that hold parent latches is given below :

No processes found.

Latch Wait List Information

~~~~~~~~~~~~~~~~~~~~~~~~~~~

Pid 8 is blocked waiting for latch 'Child library cache' with 38 waiters.

Possible holder is Pid 10027060.

.......

Pid 54 is blocked waiting for latch Child library cache with 38 waiters.

Possible holder is Pid 10027060.

39 entries seen

Summary of Wait Events Seen (count>10)

~~~~~~~~~~~~~~~~~~~~~~~~~~~

37 : 'latch: library cache'

------------------ooOoo------------------

For the LATEST version of this utility see

http://dlsunuk11.uk.oracle.com/Public/Utils.html#ass

For additional documentation see

http://dlsunuk11.uk.oracle.com/Public/TOOLS/Ass.html

Suggested improvements, bugs etc. should be sent to kevin.p.quinn@oracle.com

End of report. 18605 Lines Processed.

很显然，从脚本格式化之后的信息来看，这部分信息是我们要分析的关键：

Resource Holder State
Mutex ad39e34    19: last wait for 'latch: library cache'
Latch 70000006b9d8008    19: last wait for 'latch: library cache'
Latch 70000006b9d8008    32: 32: is waiting for 19: 32:

Resource Holder State

Mutex ad39e34 19: last wait for 'latch: library cache'

Latch 70000006b9d8008 19: last wait for 'latch: library cache'

Latch 70000006b9d8008 32: 32: is waiting for 19: 32:

19和32都是holder，而32 正在等待19. 注意，这里的19,32其实都是值的orapid。
我们可以看到，orapid 32正在正待Latch 70000006b9d8008，而orapid 32在等待19，那么orapid 19可能就是我们需要分析的关键了。从这里来看，orapid 19似乎同时持有了一个mutex和latch，下面我们来分别看下orapid 19和32 的详细信息：

PROCESS 32:
----------------------------------------
SO: 70000006f69ac40, type: 2, owner: 0, flag: INIT/-/-/0x00
(process) Oracle pid=32, calls cur/top: 0/7000000697bf6d0, flag: (0) -
int error: 0, call error: 0, sess error: 0, txn error 0
(post info) last post received: 0 0 0
last post received-location: No post
last process to post me: none
last post sent: 0 0 0
last post sent-location: No post
last process posted by me: none
(latch info) wait_event=0 bits=0
Location from where call was made: kghfrunp: clatch: wait: 
waiting for 70000006b9d8008 Child library cache level=5 child#=4 
Location from where latch is held: kghfrunp: clatch: nowait: 
Context saved from call: 0
state=busy, wlstate=free
waiters [orapid (seconds since: put on list, posted, alive check)]:
18 (7124, 1414488125, 2)
16 (7121, 1414488125, 2)
26 (7085, 1414488125, 2)
10 (7040, 1414488125, 2)
21 (6878, 1414488125, 2)
22 (6764, 1414488125, 2)
11 (6749, 1414488125, 2)
23 (6725, 1414488125, 2)
8 (6497, 1414488125, 2)
24 (6125, 1414488125, 2)
27 (5570, 1414488125, 2)
28 (5525, 1414488125, 2)
29 (5414, 1414488125, 2)
30 (4925, 1414488125, 2)
31 (4817, 1414488125, 2)
32 (4571, 1414488125, 2)
.......
53 (518, 1414488125, 2)
54 (125, 1414488125, 2)
waiter count=38
gotten 8804924 times wait, failed first 526 sleeps 874
gotten 81617 times nowait, failed: 3348
possible holder pid = 19 ospid=10027060
on wait list for 70000006b9d8008
acquiring 70000006b9d8008
Process Group: DEFAULT, pseudo proc: 70000006f783298
O/S info: user: oracle, term: UNKNOWN, ospid: 26542208 
OSD pid info: Unix process pid: 26542208, image: oracle@fbtest
.........       
----------------------------------------
SO: 70000006fa49ad8, type: 4, owner: 70000006f69ac40, flag: INIT/-/-/0x00
(session) sid: 525 trans: 0, creator: 70000006f69ac40, flag: (1) USR/- -/-/-/-/-/-
DID: 0000-0000-00000000, short-term DID: 0000-0000-00000000
txn branch: 0
oct: 0, prv: 0, sql: 0, psql: 70000006341a4e8, user: 0/SYS
O/S info: user: , term: , ospid: , machine: 
program: 
last wait for 'latch: library cache' blocking sess=0x0 seq=15189 wait_time=292981 seconds since wait started=4569
address=70000006b9d8008, number=d7, tries=3b54
Dumping Session Wait History
for 'latch: library cache' count=1 wait_time=292981
address=70000006b9d8008, number=d7, tries=3b54
。。。。。。
for 'latch: library cache' count=1 wait_time=292990
address=70000006b9d8008, number=d7, tries=3b4b

PROCESS 32:

----------------------------------------

SO: 70000006f69ac40, type: 2, owner: 0, flag: INIT/-/-/0x00

(process) Oracle pid=32, calls cur/top: 0/7000000697bf6d0, flag: (0) -

int error: 0, call error: 0, sess error: 0, txn error 0

(post info) last post received: 0 0 0

last post received-location: No post

last process to post me: none

last post sent: 0 0 0

last post sent-location: No post

last process posted by me: none

(latch info) wait_event=0 bits=0

Location from where call was made: kghfrunp: clatch: wait:

waiting for 70000006b9d8008 Child library cache level=5 child#=4

Location from where latch is held: kghfrunp: clatch: nowait:

Context saved from call: 0

state=busy, wlstate=free

waiters [orapid (seconds since: put on list, posted, alive check)]:

18 (7124, 1414488125, 2)

16 (7121, 1414488125, 2)

26 (7085, 1414488125, 2)

10 (7040, 1414488125, 2)

21 (6878, 1414488125, 2)

22 (6764, 1414488125, 2)

11 (6749, 1414488125, 2)

23 (6725, 1414488125, 2)

8 (6497, 1414488125, 2)

24 (6125, 1414488125, 2)

27 (5570, 1414488125, 2)

28 (5525, 1414488125, 2)

29 (5414, 1414488125, 2)

30 (4925, 1414488125, 2)

31 (4817, 1414488125, 2)

32 (4571, 1414488125, 2)

.......

53 (518, 1414488125, 2)

54 (125, 1414488125, 2)

waiter count=38

gotten 8804924 times wait, failed first 526 sleeps 874

gotten 81617 times nowait, failed: 3348

possible holder pid = 19 ospid=10027060

on wait list for 70000006b9d8008

acquiring 70000006b9d8008

Process Group: DEFAULT, pseudo proc: 70000006f783298

O/S info: user: oracle, term: UNKNOWN, ospid: 26542208

OSD pid info: Unix process pid: 26542208, image: oracle@fbtest

.........

----------------------------------------

SO: 70000006fa49ad8, type: 4, owner: 70000006f69ac40, flag: INIT/-/-/0x00

(session) sid: 525 trans: 0, creator: 70000006f69ac40, flag: (1) USR/- -/-/-/-/-/-

DID: 0000-0000-00000000, short-term DID: 0000-0000-00000000

txn branch: 0

oct: 0, prv: 0, sql: 0, psql: 70000006341a4e8, user: 0/SYS

O/S info: user: , term: , ospid: , machine:

program:

last wait for 'latch: library cache' blocking sess=0x0 seq=15189 wait_time=292981 seconds since wait started=4569

address=70000006b9d8008, number=d7, tries=3b54

Dumping Session Wait History

for 'latch: library cache' count=1 wait_time=292981

address=70000006b9d8008, number=d7, tries=3b54

。。。。。。

for 'latch: library cache' count=1 wait_time=292990

address=70000006b9d8008, number=d7, tries=3b4b

我们可以看到orapid 32是一个用户进程，正在等待library cache latch，latch地址是：70000006b9d8008
从possible holder pid = 19 ospid=10027060 可以看出，这个latch正在被orapid 19所持有，下面我们搜索关键字:process 19

PROCESS 19:
----------------------------------------
SO: 70000006f6945e0, type: 2, owner: 0, flag: INIT/-/-/0x00
(process) Oracle pid=19, calls cur/top: 0/7000000692e8ad0, flag: (1) DEAD
int error: 0, call error: 0, sess error: 0, txn error 0
(post info) last post received: 0 0 0
last post received-location: No post
last process to post me: none
last post sent: 0 0 0
last post sent-location: No post
last process posted by me: none
(latch info) wait_event=0 bits=200
holding    (efd=5) 70000006b9d8008 Child library cache level=5 child#=4 
Location from where latch is held: kghfrunp: clatch: nowait: 
Context saved from call: 0
state=busy, wlstate=free
waiters [orapid (seconds since: put on list, posted, alive check)]:
18 (7124, 1414488125, 2)
16 (7121, 1414488125, 2)
26 (7085, 1414488125, 2)
10 (7040, 1414488125, 2)
21 (6878, 1414488125, 2)
22 (6764, 1414488125, 2)
11 (6749, 1414488125, 2)
23 (6725, 1414488125, 2)
8 (6497, 1414488125, 2)
24 (6125, 1414488125, 2)
27 (5570, 1414488125, 2)
28 (5525, 1414488125, 2)
29 (5414, 1414488125, 2)
30 (4925, 1414488125, 2)
31 (4817, 1414488125, 2)
32 (4571, 1414488125, 2)
.......
54 (125, 1414488125, 2)
waiter count=38
Process Group: DEFAULT, pseudo proc: 70000006f783298
O/S info: user: oracle, term: UNKNOWN, ospid: 10027060 (DEAD)
OSD pid info: Unix process pid: 10027060, image: oracle@fbtest (J001)
..........
Repeat 27 times
----------------------------------------
SO: 70000006fa55b80, type: 4, owner: 70000006f6945e0, flag: INIT/-/-/0x00
(session) sid: 534 trans: 0, creator: 70000006f6945e0, flag: (1) USR/- -/-/-/-/-/-
DID: 0000-0000-00000000, short-term DID: 0000-0000-00000000
txn branch: 0
oct: 0, prv: 0, sql: 0, psql: 0, user: 0/SYS
O/S info: user: , term: , ospid: , machine: 
program: 
last wait for 'latch: library cache' blocking sess=0x0 seq=14 wait_time=507 seconds since wait started=7167
address=70000006b9d7f68, number=d7, tries=0
Dumping Session Wait History
for 'latch: library cache' count=1 wait_time=507
address=70000006b9d7f68, number=d7, tries=0
for 'latch: library cache' count=1 wait_time=328
address=70000006b9d7f68, number=d7, tries=0
for 'latch: library cache' count=1 wait_time=340
address=70000006b9d7f68, number=d7, tries=0
for 'latch: library cache' count=1 wait_time=1644
address=70000006b9d8008, number=d7, tries=0
for 'latch: library cache' count=1 wait_time=1245
address=70000006b9d8008, number=d7, tries=0
for 'latch: library cache' count=1 wait_time=1633
address=70000006b9d8008, number=d7, tries=0
for 'latch: library cache' count=1 wait_time=315
address=70000006b9d80a8, number=d7, tries=0
for 'latch: library cache' count=1 wait_time=655
address=70000006b9d80a8, number=d7, tries=0
for 'latch: library cache' count=1 wait_time=4888
address=70000006b9d80a8, number=d7, tries=0
for 'latch: library cache' count=1 wait_time=176
address=70000006b9d80a8, number=d7, tries=0
........
temporary object counter: 0
----------------------------------------
UOL used : 1 locks(used=0, free=0)
KGX Atomic Operation Log 70000006e404f90
Mutex 70000003eec4be0(534, 0) idn ad39e34 oper EXCL
Cursor Pin uid 534 efd 5 whr 11 slp 0
KGX Atomic Operation Log 70000006e404fd8
Mutex 0(0, 0) idn 0 oper NONE
Library Cache uid 534 efd 0 whr 0 slp 0
KGX Atomic Operation Log 70000006e405020
Mutex 0(0, 0) idn 0 oper NONE
Library Cache uid 534 efd 0 whr 0 slp 0

PROCESS 19:

----------------------------------------

SO: 70000006f6945e0, type: 2, owner: 0, flag: INIT/-/-/0x00

(process) Oracle pid=19, calls cur/top: 0/7000000692e8ad0, flag: (1) DEAD

int error: 0, call error: 0, sess error: 0, txn error 0

(post info) last post received: 0 0 0

last post received-location: No post

last process to post me: none

last post sent: 0 0 0

last post sent-location: No post

last process posted by me: none

(latch info) wait_event=0 bits=200

holding (efd=5) 70000006b9d8008 Child library cache level=5 child#=4

Location from where latch is held: kghfrunp: clatch: nowait:

Context saved from call: 0

state=busy, wlstate=free

waiters [orapid (seconds since: put on list, posted, alive check)]:

18 (7124, 1414488125, 2)

16 (7121, 1414488125, 2)

26 (7085, 1414488125, 2)

10 (7040, 1414488125, 2)

21 (6878, 1414488125, 2)

22 (6764, 1414488125, 2)

11 (6749, 1414488125, 2)

23 (6725, 1414488125, 2)

8 (6497, 1414488125, 2)

24 (6125, 1414488125, 2)

27 (5570, 1414488125, 2)

28 (5525, 1414488125, 2)

29 (5414, 1414488125, 2)

30 (4925, 1414488125, 2)

31 (4817, 1414488125, 2)

32 (4571, 1414488125, 2)

.......

54 (125, 1414488125, 2)

waiter count=38

Process Group: DEFAULT, pseudo proc: 70000006f783298

O/S info: user: oracle, term: UNKNOWN, ospid: 10027060 (DEAD)

OSD pid info: Unix process pid: 10027060, image: oracle@fbtest (J001)

..........

Repeat 27 times

----------------------------------------

SO: 70000006fa55b80, type: 4, owner: 70000006f6945e0, flag: INIT/-/-/0x00

(session) sid: 534 trans: 0, creator: 70000006f6945e0, flag: (1) USR/- -/-/-/-/-/-

DID: 0000-0000-00000000, short-term DID: 0000-0000-00000000

txn branch: 0

oct: 0, prv: 0, sql: 0, psql: 0, user: 0/SYS

O/S info: user: , term: , ospid: , machine:

program:

last wait for 'latch: library cache' blocking sess=0x0 seq=14 wait_time=507 seconds since wait started=7167

address=70000006b9d7f68, number=d7, tries=0

Dumping Session Wait History

for 'latch: library cache' count=1 wait_time=507

address=70000006b9d7f68, number=d7, tries=0

for 'latch: library cache' count=1 wait_time=328

address=70000006b9d7f68, number=d7, tries=0

for 'latch: library cache' count=1 wait_time=340

address=70000006b9d7f68, number=d7, tries=0

for 'latch: library cache' count=1 wait_time=1644

address=70000006b9d8008, number=d7, tries=0

for 'latch: library cache' count=1 wait_time=1245

address=70000006b9d8008, number=d7, tries=0

for 'latch: library cache' count=1 wait_time=1633

address=70000006b9d8008, number=d7, tries=0

for 'latch: library cache' count=1 wait_time=315

address=70000006b9d80a8, number=d7, tries=0

for 'latch: library cache' count=1 wait_time=655

address=70000006b9d80a8, number=d7, tries=0

for 'latch: library cache' count=1 wait_time=4888

address=70000006b9d80a8, number=d7, tries=0

for 'latch: library cache' count=1 wait_time=176

address=70000006b9d80a8, number=d7, tries=0

........

temporary object counter: 0

----------------------------------------

UOL used : 1 locks(used=0, free=0)

KGX Atomic Operation Log 70000006e404f90

Mutex 70000003eec4be0(534, 0) idn ad39e34 oper EXCL

Cursor Pin uid 534 efd 5 whr 11 slp 0

KGX Atomic Operation Log 70000006e404fd8

Mutex 0(0, 0) idn 0 oper NONE

Library Cache uid 534 efd 0 whr 0 slp 0

KGX Atomic Operation Log 70000006e405020

Mutex 0(0, 0) idn 0 oper NONE

Library Cache uid 534 efd 0 whr 0 slp 0

我们可以看到orapid 19这个进程正在持有llibrary cache latch 70000006b9d8008没有释放，到前面的orapid 32一直等待。
然而我们可以看到,orapid 19虽然持有了70000006b9d8008这个latch,然而同时它也在等待一个library cache latch 70000006b9d7f68
我们可以看到这个orapid 19的进程其实是J001，其状态已经为dead了。同时我们分析下面的mutex信息，发现该进程同时还持有了mutex： Mutex 70000003eec4be0(534, 0) idn ad39e34 oper EXCL
从这句话我们可以看出，orapid 19还持有了mutex ad39e34，而且mode为exel，即排他模式. 根据这个mutex 地址，我们进行搜索，发现 Oracle Pmon进程正在申请这个Mutex，如下：

PROCESS 2:
----------------------------------------
SO: 70000006f68c000, type: 2, owner: 0, flag: INIT/-/-/0x00
(process) Oracle pid=2, calls cur/top: 70000006fab6310/70000006fab6310, flag: (e) SYSTEM
int error: 0, call error: 0, sess error: 0, txn error 0
(post info) last post received: 0 0 33
last post received-location: ksrpublish
last process to post me: 70000006f692660 3 2
last post sent: 0 0 48
last post sent-location: ksoreq_reply
last process posted by me: 70000006f7809e0 1 0
(latch info) wait_event=0 bits=0
Process Group: DEFAULT, pseudo proc: 70000006f783298
O/S info: user: oracle, term: UNKNOWN, ospid: 9895986 
OSD pid info: Unix process pid: 9895986, image: oracle@fbtest (PMON)
........     
----------------------------------------
SO: 70000006fd51908, type: 11, owner: 70000006f68c000, flag: INIT/-/-/0x00
(broadcast handle) flag: (2) ACTIVE SUBSCRIBER, owner: 70000006f68c000,
event: 1, last message event: 13,
last message waited event: 13, messages read: 1
channel: (70000006fd77710) scumnt mount lock
scope: 1, event: 13, last mesage event: 13,
publishers/subscribers: 0/11,
messages published: 1
----------------------------------------
SO: 70000006fa71d08, type: 4, owner: 70000006f68c000, flag: INIT/-/-/0x00
(session) sid: 555 trans: 0, creator: 70000006f68c000, flag: (51) USR/- BSY/-/-/-/-/-
DID: 0000-0002-00000002, short-term DID: 0000-0000-00000000
txn branch: 0
oct: 0, prv: 0, sql: 0, psql: 0, user: 0/SYS
service name: SYS$BACKGROUND
waiting for 'cursor: pin X' blocking sess=0x0 seq=37912 wait_time=0 seconds since wait started=0
idn=ad39e34, value=21600000000, where|sleeps=b000a93dd
Dumping Session Wait History
for 'cursor: pin X' count=1 wait_time=9782
idn=ad39e34, value=21600000000, where|sleeps=b000a93dc
..........
idn=ad39e34, value=21600000000, where|sleeps=b000a93d4
for 'cursor: pin X' count=1 wait_time=9782
idn=ad39e34, value=21600000000, where|sleeps=b000a93d3
temporary object counter: 0
----------------------------------------
UOL used : 1 locks(used=0, free=0)
KGX Atomic Operation Log 70000006eb507f8
Mutex 70000003eec4be0(534, 0) idn ad39e34 oper GET_EXCL
Cursor Pin uid 555 efd 5 whr 11 slp 5127
KGX Atomic Operation Log 70000006eb50840
Mutex 0(0, 0) idn 0 oper NONE
Library Cache uid 555 efd 0 whr 0 slp 0
KGX Atomic Operation Log 70000006eb50888
Mutex 0(0, 0) idn 0 oper NONE
Library Cache uid 555 efd 0 whr 0 slp 0
----------------------------------------

PROCESS 2:

----------------------------------------

SO: 70000006f68c000, type: 2, owner: 0, flag: INIT/-/-/0x00

(process) Oracle pid=2, calls cur/top: 70000006fab6310/70000006fab6310, flag: (e) SYSTEM

int error: 0, call error: 0, sess error: 0, txn error 0

(post info) last post received: 0 0 33

last post received-location: ksrpublish

last process to post me: 70000006f692660 3 2

last post sent: 0 0 48

last post sent-location: ksoreq_reply

last process posted by me: 70000006f7809e0 1 0

(latch info) wait_event=0 bits=0

Process Group: DEFAULT, pseudo proc: 70000006f783298

O/S info: user: oracle, term: UNKNOWN, ospid: 9895986

OSD pid info: Unix process pid: 9895986, image: oracle@fbtest (PMON)

........

----------------------------------------

SO: 70000006fd51908, type: 11, owner: 70000006f68c000, flag: INIT/-/-/0x00

(broadcast handle) flag: (2) ACTIVE SUBSCRIBER, owner: 70000006f68c000,

event: 1, last message event: 13,

last message waited event: 13, messages read: 1

channel: (70000006fd77710) scumnt mount lock

scope: 1, event: 13, last mesage event: 13,

publishers/subscribers: 0/11,

messages published: 1

----------------------------------------

SO: 70000006fa71d08, type: 4, owner: 70000006f68c000, flag: INIT/-/-/0x00

(session) sid: 555 trans: 0, creator: 70000006f68c000, flag: (51) USR/- BSY/-/-/-/-/-

DID: 0000-0002-00000002, short-term DID: 0000-0000-00000000

txn branch: 0

oct: 0, prv: 0, sql: 0, psql: 0, user: 0/SYS

service name: SYS$BACKGROUND

waiting for 'cursor: pin X' blocking sess=0x0 seq=37912 wait_time=0 seconds since wait started=0

idn=ad39e34, value=21600000000, where|sleeps=b000a93dd

Dumping Session Wait History

for 'cursor: pin X' count=1 wait_time=9782

idn=ad39e34, value=21600000000, where|sleeps=b000a93dc

..........

idn=ad39e34, value=21600000000, where|sleeps=b000a93d4

for 'cursor: pin X' count=1 wait_time=9782

idn=ad39e34, value=21600000000, where|sleeps=b000a93d3

temporary object counter: 0

----------------------------------------

UOL used : 1 locks(used=0, free=0)

KGX Atomic Operation Log 70000006eb507f8

Mutex 70000003eec4be0(534, 0) idn ad39e34 oper GET_EXCL

Cursor Pin uid 555 efd 5 whr 11 slp 5127

KGX Atomic Operation Log 70000006eb50840

Mutex 0(0, 0) idn 0 oper NONE

Library Cache uid 555 efd 0 whr 0 slp 0

KGX Atomic Operation Log 70000006eb50888

Mutex 0(0, 0) idn 0 oper NONE

Library Cache uid 555 efd 0 whr 0 slp 0

----------------------------------------

可以看到pmon进程一直在等待这个mutex，也需要申请获得排他模式的mutex（类似传统的排他模式的library cache pin).
而由于这个mutex被前面的orapid 19（j001）锁持有，没有释放，导致pmon进程挂起。
最后我们简单的总结下：
1、大量的进程在等待library cache latch，而该latch被orapid 19（J001）进程所持有.

2、J001进程持有library cache latch不释放，同时还申请持有mutex，该进程状态是dead

3、Pmon进程申请排它模式的Mutex,而该mutex被J001进程持有，且状态也是排它模式.

4、按理说J001进程死掉之后，其持有的latch信息应该都会被pmon进程所清理释放掉，因为这本身就是pmon进程的分内之事，然而由于mutex的原因，导致pmon进程挂起。

5、最后由于pmon进程也无法获得mutex，导致无法清理j0001进程，最后导致j0001阻塞了大量会话，出现大面积library cache latch等待，最后整个系统hung.
6、个人认为问题的根据还是mutex的问题，library cache latch只是表象.

Tags: 10.2.0.4, cursor: pin X, mutex, oracle hung
Posted in troubleshooting on 10月 31, 2014

You must be logged in to post a comment.

love wife love life —Roger的Oracle/MySQL/PostgreSQL数据恢复博客

Categories

Archives

最新评论

国内圈子

oracle security

The cause of system hung is cursor: pin X ?

Leave a Reply