gc freelist问题再现

本站文章除注明转载外，均为本站原创： 转载自love wife love life —Roger的Oracle/MySQL/PostgreSQL数据恢复博客

今天一大早某客户在vx群反馈其中一套系统早上上班时间出现了一段时间的hung，后面自动恢复了。从客户提供的ASH数据来看，问题相对清晰：

Slot Time (Duration)	Slot Count	Event	Event Count	% Event
09:26:01 (2.0 min)	56,948	gc freelist	42,070	9.28
09:26:01 (2.0 min)	56948	read by other session	9,985	2.20
09:26:01 (2.0 min)	56948	enq: HW – contention	3,332	0.73
09:28:00 (2.0 min)	59,045	gc freelist	43,374	9.56
09:28:00 (2.0 min)	59045	read by other session	10,188	2.25
09:28:00 (2.0 min)	59045	enq: HW – contention	3,773	0.83
09:30:00 (2.0 min)	59,432	gc freelist	43,156	9.52
09:30:00 (2.0 min)	59432	read by other session	9,253	2.04
09:30:00 (2.0 min)	59432	enq: HW – contention	5,238	1.16
09:32:00 (2.0 min)	59,812	gc freelist	43,037	9.49
09:32:00 (2.0 min)	59812	read by other session	9,296	2.05
09:32:00 (2.0 min)	59812	enq: HW – contention	5,449	1.20
09:34:00 (2.0 min)	60,775	gc freelist	43,675	9.63
09:34:00 (2.0 min)	60775	read by other session	9,619	2.12
09:34:00 (2.0 min)	60775	enq: HW – contention	5,428	1.20
09:36:00 (2.0 min)	63,090	gc freelist	44,403	9.79
09:36:00 (2.0 min)	63090	read by other session	11,003	2.43
09:36:00 (2.0 min)	63090	enq: HW – contention	5,520	1.22
09:38:00 (2.0 min)	63,576	gc freelist	44,636	9.84
09:38:00 (2.0 min)	63576	read by other session	11,286	2.49
09:38:00 (2.0 min)	63576	enq: HW – contention	5,474	1.21
09:40:00 (1.0 min)	30,798	gc freelist	20,947	4.62
09:40:00 (1.0 min)	30798	read by other session	5,530	1.22
09:40:00 (1.0 min)	30798	enq: HW – contention	2,688	0.59

之前写过一篇关于gc freelist等待的文章。实际上该等待事件并不多见，如果当你发现top event中出现该等待，那么多半是遇到了Bug。

我们回到这个案例本身，实际上从ASH数据来看，无非就是gc freelist,enq:HW – contention,read by other session. 从直观感觉我认为可能大概率出问题在gc freelist，从而导致整个集群都非常缓慢，近似于hung的状态。当然，当其中一个节点都出现了大量gc freelist，导致整个系统很慢的情况下，那么另外一个节点ash 势必也是大量的锁等待。

Slot Time (Duration)	Slot Count	Event	Event Count	% Event
09:22:36 (1.4 min)	6,506	enq: HW – contention	2,859	3.02
09:22:36 (1.4 min)	6506	enq: TX – row lock contention	1,380	1.46
09:22:36 (1.4 min)	6506	gc buffer busy acquire	1,128	1.19
09:24:00 (2.0 min)	10,361	enq: HW – contention	4,266	4.51
09:24:00 (2.0 min)	10361	enq: TX – index contention	2,679	2.83
09:24:00 (2.0 min)	10361	enq: TX – row lock contention	2,452	2.59
09:26:00 (2.0 min)	11,352	enq: HW – contention	4,550	4.81
09:26:00 (2.0 min)	11352	enq: TX – index contention	3,030	3.20
09:26:00 (2.0 min)	11352	enq: TX – row lock contention	2,802	2.96
09:28:00 (2.0 min)	12,415	enq: HW – contention	5,000	5.29
09:28:00 (2.0 min)	12415	enq: TX – index contention	3,411	3.61
09:28:00 (2.0 min)	12415	enq: TX – row lock contention	3,078	3.25
09:30:00 (2.0 min)	13,300	enq: HW – contention	5,271	5.57
09:30:00 (2.0 min)	13300	enq: TX – index contention	3,184	3.37
09:30:00 (2.0 min)	13300	enq: TX – row lock contention	3,079	3.25
09:32:00 (2.0 min)	13,638	enq: HW – contention	5,423	5.73
09:32:00 (2.0 min)	13638	enq: TX – row lock contention	3,296	3.48
09:32:00 (2.0 min)	13638	gc buffer busy acquire	3,216	3.40
09:34:00 (2.0 min)	14,283	enq: HW – contention	5,647	5.97
09:34:00 (2.0 min)	14283	enq: TX – row lock contention	3,678	3.89
09:34:00 (2.0 min)	14283	gc buffer busy acquire	3,307	3.50
09:36:00 (1.7 min)	12,751	enq: HW – contention	5,083	5.37
09:36:00 (1.7 min)	12751	enq: TX – row lock contention	3,327	3.52
09:36:00 (1.7 min)	12751	gc buffer busy acquire	3,108	3.29

因此这个时候看ash已经没有什么用了。从目前判断问题主要在节点2上，因此我来看下客户提供的节点2故障时间段内的一个diag trace：

SO: 0x3b02491fd8, type: 2, owner: (nil), flag: INIT/-/-/0x00 if: 0x3 c: 0x3
proc=0x3b02491fd8, name=process, file=ksu.h LINE:12721, pg=0
(process) Oracle pid:903, ser:142, calls cur/top: 0x27a2ded4f8/0x27a874fa70
flags : (0x0) -
flags2: (0x0),  flags3: (0x10) 
intr error: 0, call error: 0, sess error: 0, txn error 0
intr queue: empty
ksudlp FALSE at location: 0
(post info) last post received: 184 0 2
last post received-location: ksl2.h LINE:2374 ID:kslpsr
last process to post me: 0x39c2424aa0 1 6
last post sent: 0 0 183
last post sent-location: kcl2.h LINE:4436 ID:kclnfndnewm
last process posted by me: 0x3b023aa048 1 6
(latch info) wait_event=184 bits=0x0
Process Group: DEFAULT, pseudo proc: 0x39c262fe68
O/S info: user: grid, term: UNKNOWN, ospid: 52182 
OSD pid info: Unix process pid: 52182, image: oracle@xxxxxdb2
----------------------------------------
SO: 0x3a0ca21218, type: 10, owner: 0x3b02491fd8, flag: INIT/-/-/0x00 if: 0x1 c: 0x1
proc=0x3b02491fd8, name=FileOpenBlock, file=ksfd.h LINE:6688, pg=0
(FOB) 0x3a0ca21218 flags=2560 fib=0x39cc6c67a8 incno=8 pending i/o cnt=0
fname=+DATA/ythdb/datafile/system.256.1048801851
fno=1 lblksz=8192 fsiz=651520
----------------------------------------
SO: 0x39e29be138, type: 4, owner: 0x3b02491fd8, flag: INIT/-/-/0x00 if: 0x3 c: 0x3
proc=0x3b02491fd8, name=session, file=ksu.h LINE:12729, pg=0
(session) sid: 1254 ser: 26125 trans: 0x387e16b0b0, creator: 0x3b02491fd8
flags: (0x100041) USR/- flags_idl: (0x1) BSY/-/-/-/-/-
flags2: (0x40009) -/-/INC
DID: , short-term DID: 
txn branch: (nil)
edition#: 0              oct: 2, prv: 0, sql: 0x3a9dc80b10, psql: 0x3a9dc80b10, user: 0/SYS
ksuxds FALSE at location: 0
service name: ythdb
client details:
O/S info: user: root, term: unknown, ospid: 1234
machine: gfmis-pay-sendglsaccount-85b456d76c-zzctl program: JDBC Thin Client
application name: JDBC Thin Client, hash value=2546894660
Current Wait Stack:
0: waiting for 'gc freelist'
=0x0, =0x0, =0x0
wait_id=4839 seq_num=4840 snap_id=1
wait times: snap=0.001568 sec, exc=0.001568 sec, total=0.001568 sec
wait times: max=0.200000 sec, heur=16 min 8 sec
wait counts: calls=1 os=1
in_wait=1 iflags=0x5a8
There are 64 sessions blocked by this session.
Dumping one waiter:
inst: 2, sid: 1289, ser: 25931
wait event: 'enq: HW - contention'
p1: 'name|mode'=0x48570006
p2: 'table space #'=0x0
p3: 'block'=0x400978
row_wait_obj#: 4294967295, block#: 0, row#: 0, file# 0
min_blocked_time: 943 secs, waiter_cache_ver: 38771
Wait State:
fixed_waits=0 flags=0x22 boundary=(nil)/-1
Session Wait History:
elapsed time of 0.000100 sec since current wait
0: waited for 'gc freelist'
=0x0, =0x0, =0x0
wait_id=4838 seq_num=4839 snap_id=1
wait times: snap=0.017854 sec, exc=0.017854 sec, total=0.017854 sec
wait times: max=0.200000 sec
wait counts: calls=1 os=1
occurred after 0.000090 sec of elapsed time
1: waited for 'gc freelist'
=0x0, =0x0, =0x0
wait_id=4837 seq_num=4838 snap_id=1
wait times: snap=0.200893 sec, exc=0.200893 sec, total=0.200893 sec
wait times: max=0.200000 sec
wait counts: calls=1 os=1
occurred after 0.000105 sec of elapsed time
2: waited for 'gc freelist'
=0x0, =0x0, =0x0
wait_id=4836 seq_num=4837 snap_id=1
wait times: snap=0.200923 sec, exc=0.200923 sec, total=0.200923 sec
wait times: max=0.200000 sec
wait counts: calls=1 os=1
occurred after 0.000078 sec of elapsed time
3: waited for 'gc freelist'
=0x0, =0x0, =0x0
wait_id=4835 seq_num=4836 snap_id=1
wait times: snap=0.200908 sec, exc=0.200908 sec, total=0.200908 sec
wait times: max=0.200000 sec
wait counts: calls=1 os=1
occurred after 0.000092 sec of elapsed time
4: waited for 'gc freelist'
=0x0, =0x0, =0x0
wait_id=4834 seq_num=4835 snap_id=1
wait times: snap=0.200906 sec, exc=0.200906 sec, total=0.200906 sec
wait times: max=0.200000 sec
wait counts: calls=1 os=1
occurred after 0.000094 sec of elapsed time
5: waited for 'gc freelist'
=0x0, =0x0, =0x0
wait_id=4833 seq_num=4834 snap_id=1
wait times: snap=0.200900 sec, exc=0.200900 sec, total=0.200900 sec
wait times: max=0.200000 sec
wait counts: calls=1 os=1
occurred after 0.000092 sec of elapsed time
6: waited for 'gc freelist'
=0x0, =0x0, =0x0
wait_id=4832 seq_num=4833 snap_id=1
wait times: snap=0.200918 sec, exc=0.200918 sec, total=0.200918 sec
wait times: max=0.200000 sec
wait counts: calls=1 os=1
occurred after 0.000088 sec of elapsed time
7: waited for 'gc freelist'
=0x0, =0x0, =0x0
wait_id=4831 seq_num=4832 snap_id=1
wait times: snap=0.200907 sec, exc=0.200907 sec, total=0.200907 sec
wait times: max=0.200000 sec
wait counts: calls=1 os=1
occurred after 0.000094 sec of elapsed time
8: waited for 'gc freelist'
=0x0, =0x0, =0x0
wait_id=4830 seq_num=4831 snap_id=1
wait times: snap=0.200883 sec, exc=0.200883 sec, total=0.200883 sec
wait times: max=0.200000 sec
wait counts: calls=1 os=1
occurred after 0.000113 sec of elapsed time
9: waited for 'gc freelist'
=0x0, =0x0, =0x0
wait_id=4829 seq_num=4830 snap_id=1
wait times: snap=0.200897 sec, exc=0.200897 sec, total=0.200897 sec
wait times: max=0.200000 sec
wait counts: calls=1 os=1
occurred after 0.000103 sec of elapsed time

100

101

102

103

104

105

106

107

108

109

110

111

112

113

114

115

116

117

118

119

120

121

122

123

124

125

126

127

128

129

130

SO: 0x3b02491fd8, type: 2, owner: (nil), flag: INIT/-/-/0x00 if: 0x3 c: 0x3

proc=0x3b02491fd8, name=process, file=ksu.h LINE:12721, pg=0

(process) Oracle pid:903, ser:142, calls cur/top: 0x27a2ded4f8/0x27a874fa70

flags : (0x0) -

flags2: (0x0), flags3: (0x10)

intr error: 0, call error: 0, sess error: 0, txn error 0

intr queue: empty

ksudlp FALSE at location: 0

(post info) last post received: 184 0 2

last post received-location: ksl2.h LINE:2374 ID:kslpsr

last process to post me: 0x39c2424aa0 1 6

last post sent: 0 0 183

last post sent-location: kcl2.h LINE:4436 ID:kclnfndnewm

last process posted by me: 0x3b023aa048 1 6

(latch info) wait_event=184 bits=0x0

Process Group: DEFAULT, pseudo proc: 0x39c262fe68

O/S info: user: grid, term: UNKNOWN, ospid: 52182

OSD pid info: Unix process pid: 52182, image: oracle@xxxxxdb2

----------------------------------------

SO: 0x3a0ca21218, type: 10, owner: 0x3b02491fd8, flag: INIT/-/-/0x00 if: 0x1 c: 0x1

proc=0x3b02491fd8, name=FileOpenBlock, file=ksfd.h LINE:6688, pg=0

(FOB) 0x3a0ca21218 flags=2560 fib=0x39cc6c67a8 incno=8 pending i/o cnt=0

fname=+DATA/ythdb/datafile/system.256.1048801851

fno=1 lblksz=8192 fsiz=651520

----------------------------------------

SO: 0x39e29be138, type: 4, owner: 0x3b02491fd8, flag: INIT/-/-/0x00 if: 0x3 c: 0x3

proc=0x3b02491fd8, name=session, file=ksu.h LINE:12729, pg=0

(session) sid: 1254 ser: 26125 trans: 0x387e16b0b0, creator: 0x3b02491fd8

flags: (0x100041) USR/- flags_idl: (0x1) BSY/-/-/-/-/-

flags2: (0x40009) -/-/INC

DID: , short-term DID:

txn branch: (nil)

edition#: 0 oct: 2, prv: 0, sql: 0x3a9dc80b10, psql: 0x3a9dc80b10, user: 0/SYS

ksuxds FALSE at location: 0

service name: ythdb

client details:

O/S info: user: root, term: unknown, ospid: 1234

machine: gfmis-pay-sendglsaccount-85b456d76c-zzctl program: JDBC Thin Client

application name: JDBC Thin Client, hash value=2546894660

Current Wait Stack:

0: waiting for 'gc freelist'

=0x0, =0x0, =0x0

wait_id=4839 seq_num=4840 snap_id=1

wait times: snap=0.001568 sec, exc=0.001568 sec, total=0.001568 sec

wait times: max=0.200000 sec, heur=16 min 8 sec

wait counts: calls=1 os=1

in_wait=1 iflags=0x5a8

There are 64 sessions blocked by this session.

Dumping one waiter:

inst: 2, sid: 1289, ser: 25931

wait event: 'enq: HW - contention'

p1: 'name|mode'=0x48570006

p2: 'table space #'=0x0

p3: 'block'=0x400978

row_wait_obj#: 4294967295, block#: 0, row#: 0, file# 0

min_blocked_time: 943 secs, waiter_cache_ver: 38771

Wait State:

fixed_waits=0 flags=0x22 boundary=(nil)/-1

Session Wait History:

elapsed time of 0.000100 sec since current wait

0: waited for 'gc freelist'

=0x0, =0x0, =0x0

wait_id=4838 seq_num=4839 snap_id=1

wait times: snap=0.017854 sec, exc=0.017854 sec, total=0.017854 sec

wait times: max=0.200000 sec

wait counts: calls=1 os=1

occurred after 0.000090 sec of elapsed time

1: waited for 'gc freelist'

=0x0, =0x0, =0x0

wait_id=4837 seq_num=4838 snap_id=1

wait times: snap=0.200893 sec, exc=0.200893 sec, total=0.200893 sec

wait times: max=0.200000 sec

wait counts: calls=1 os=1

occurred after 0.000105 sec of elapsed time

2: waited for 'gc freelist'

=0x0, =0x0, =0x0

wait_id=4836 seq_num=4837 snap_id=1

wait times: snap=0.200923 sec, exc=0.200923 sec, total=0.200923 sec

wait times: max=0.200000 sec

wait counts: calls=1 os=1

occurred after 0.000078 sec of elapsed time

3: waited for 'gc freelist'

=0x0, =0x0, =0x0

wait_id=4835 seq_num=4836 snap_id=1

wait times: snap=0.200908 sec, exc=0.200908 sec, total=0.200908 sec

wait times: max=0.200000 sec

wait counts: calls=1 os=1

occurred after 0.000092 sec of elapsed time

4: waited for 'gc freelist'

=0x0, =0x0, =0x0

wait_id=4834 seq_num=4835 snap_id=1

wait times: snap=0.200906 sec, exc=0.200906 sec, total=0.200906 sec

wait times: max=0.200000 sec

wait counts: calls=1 os=1

occurred after 0.000094 sec of elapsed time

5: waited for 'gc freelist'

=0x0, =0x0, =0x0

wait_id=4833 seq_num=4834 snap_id=1

wait times: snap=0.200900 sec, exc=0.200900 sec, total=0.200900 sec

wait times: max=0.200000 sec

wait counts: calls=1 os=1

occurred after 0.000092 sec of elapsed time

6: waited for 'gc freelist'

=0x0, =0x0, =0x0

wait_id=4832 seq_num=4833 snap_id=1

wait times: snap=0.200918 sec, exc=0.200918 sec, total=0.200918 sec

wait times: max=0.200000 sec

wait counts: calls=1 os=1

occurred after 0.000088 sec of elapsed time

7: waited for 'gc freelist'

=0x0, =0x0, =0x0

wait_id=4831 seq_num=4832 snap_id=1

wait times: snap=0.200907 sec, exc=0.200907 sec, total=0.200907 sec

wait times: max=0.200000 sec

wait counts: calls=1 os=1

occurred after 0.000094 sec of elapsed time

8: waited for 'gc freelist'

=0x0, =0x0, =0x0

wait_id=4830 seq_num=4831 snap_id=1

wait times: snap=0.200883 sec, exc=0.200883 sec, total=0.200883 sec

wait times: max=0.200000 sec

wait counts: calls=1 os=1

occurred after 0.000113 sec of elapsed time

9: waited for 'gc freelist'

=0x0, =0x0, =0x0

wait_id=4829 seq_num=4830 snap_id=1

wait times: snap=0.200897 sec, exc=0.200897 sec, total=0.200897 sec

wait times: max=0.200000 sec

wait counts: calls=1 os=1

occurred after 0.000103 sec of elapsed time

我们可以清楚的看到，实际上大量session被该会话所阻塞，而该session在等待gc freelist，而且已经等待了10多分钟。

进一步分析该session的trace，可以看到执行情况，发现跟审计有关：

User=0x39e29be138 Session=0x39e29be138 ReferenceCount=1         
Flags=CNB/[0001] SavepointNum=637acf85 
LibraryHandle:  Address=0x3a9dc80b10 Hash=367d06a6 LockMode=N PinMode=0 LoadLockMode=0 Status=VALD 
ObjectName:  Name=insert into sys.aud$( sessionid,entryid,statement,ntimestamp#, userid,userhost,terminal,action#,returncode, obj$creator,obj$name,auth$privileges,auth$grantee, new$owner,new$name,ses$actions,ses$tid,logoff$pread, logoff$lwrite,logoff$dead,comment$text,spare1,spare2,  priv$used,clientid,sessioncpu,proxy$sid,user$guid, instance#,process#,xid,scn,auditid,  sqlbind,sqltext,obj$edition,dbid)  values(:1,:2,:3,SYS_EXTRACT_UTC(SYSTIMESTAMP),     :4,:5,:6,:7,:8,     :9,:10,:11,:12,     :13,:14,:15,:16,:17,     :18,:1 
FullHashValue=4ba8a6dafb69204e4de1216b367d06a6 Namespace=SQL AREA(00) Type=CURSOR(00) Identifier=914163366 OwnerIdn=0 
Statistics:  InvalidationCount=0 ExecutionCount=299986 LoadCount=20 ActiveLocks=869 TotalLockCount=299437 TotalPinCount=2 
Counters:  BrokenCount=1 RevocablePointer=1 KeepDependency=2 Version=0 BucketInUse=37261 HandleInUse=37261 HandleReferenceCount=0

User=0x39e29be138 Session=0x39e29be138 ReferenceCount=1

Flags=CNB/[0001] SavepointNum=637acf85

LibraryHandle: Address=0x3a9dc80b10 Hash=367d06a6 LockMode=N PinMode=0 LoadLockMode=0 Status=VALD

ObjectName: Name=insert into sys.aud$( sessionid,entryid,statement,ntimestamp#, userid,userhost,terminal,action#,returncode, obj$creator,obj$name,auth$privileges,auth$grantee, new$owner,new$name,ses$actions,ses$tid,logoff$pread, logoff$lwrite,logoff$dead,comment$text,spare1,spare2, priv$used,clientid,sessioncpu,proxy$sid,user$guid, instance#,process#,xid,scn,auditid, sqlbind,sqltext,obj$edition,dbid) values(:1,:2,:3,SYS_EXTRACT_UTC(SYSTIMESTAMP), :4,:5,:6,:7,:8, :9,:10,:11,:12, :13,:14,:15,:16,:17, :18,:1

FullHashValue=4ba8a6dafb69204e4de1216b367d06a6 Namespace=SQL AREA(00) Type=CURSOR(00) Identifier=914163366 OwnerIdn=0

Statistics: InvalidationCount=0 ExecutionCount=299986 LoadCount=20 ActiveLocks=869 TotalLockCount=299437 TotalPinCount=2

Counters: BrokenCount=1 RevocablePointer=1 KeepDependency=2 Version=0 BucketInUse=37261 HandleInUse=37261 HandleReferenceCount=0

从现象来看，跟Oracle mos文章 INSERT INTO SYS.AUD$ Is Blocking Due to Waiting on ‘gc freelist’ (Doc ID 2207798.1) 描述很像的。

同时也参考了几篇类似文章例如Unified And/Or Traditional Auditing Inserts Waiting On GC FREELIST (In ASSM Tablespace) (Doc ID 2596033.1)

总的来讲，建议是调整如下参数：_gc_element_percent 建议设置为140~200之间(默认值是110).

当然，根据本文这里的case，实际上也可以通过move aud$到其他表空间来解决或关闭审计来绕过这个问题。

Tags: ASSM tablespace, enq: TX - row lock contention, enq:HW - contention, gc freelist, _gc_element_percent
Posted in Oracle rdbms, troubleshooting on 11月 21, 2022

You must be logged in to post a comment.

love wife love life —Roger的Oracle/MySQL/PostgreSQL数据恢复博客

Categories

Archives

最新评论

国内圈子

oracle security

gc freelist问题再现

Leave a Reply