关于Disk file operation IO的一个案例分析
本站文章除注明转载外,均为本站原创: 转载自love wife love life —Roger的Oracle/MySQL/PostgreSQL数据恢复博客
本文链接地址: 关于Disk file operation IO的一个案例分析
最近某银行新核心上线,我司同事连续奋战了大半年,真不容易。在此为大家的辛苦付出点赞。上线后,用户的EM监控突然来了一个告警;
最开始以为是个简单的问题,因此就简单查询了一下ash数据,发现主要是一个SQL语句上:
| 
 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20  | 
1151 3wybn83gkuc4k oracle@mccsdb01 (PPA7)         Disk file operations I/O   8 FileOperation  8 filetype  2020-06-15 14:39 1151 3wybn83gkuc4k oracle@mccsdb01 (PPA7)         Disk file operations I/O   8 FileOperation  8 filetype  2020-06-15 14:39 1151 3wybn83gkuc4k oracle@mccsdb01 (PPA7)         Disk file operations I/O   8 FileOperation  8 filetype  2020-06-15 14:39 1151 3wybn83gkuc4k oracle@mccsdb01 (PPA7)         Disk file operations I/O   8 FileOperation  8 filetype  2020-06-15 14:39 1151 3wybn83gkuc4k oracle@mccsdb01 (PPA7)         Disk file operations I/O   8 FileOperation  8 filetype  2020-06-15 14:39 1151 3wybn83gkuc4k oracle@mccsdb01 (PPA7)         Disk file operations I/O   8 FileOperation  8 filetype  2020-06-15 14:39 1151 3wybn83gkuc4k oracle@mccsdb01 (PPA7)         Disk file operations I/O   8 FileOperation  8 filetype  2020-06-15 14:39 1151 3wybn83gkuc4k oracle@mccsdb01 (PPA7)         Disk file operations I/O   8 FileOperation  8 filetype  2020-06-15 14:39 1151 3wybn83gkuc4k oracle@mccsdb01 (PPA7)         Disk file operations I/O   8 FileOperation  8 filetype  2020-06-15 14:39 1151 3wybn83gkuc4k oracle@mccsdb01 (PPA7)         Disk file operations I/O   8 FileOperation  8 filetype  2020-06-15 14:39 1151 3wybn83gkuc4k oracle@mccsdb01 (PPA7)         Disk file operations I/O   8 FileOperation  8 filetype  2020-06-15 14:39 1151 3wybn83gkuc4k oracle@mccsdb01 (PPA7)         Disk file operations I/O   8 FileOperation  8 filetype  2020-06-15 14:39 1151 3wybn83gkuc4k oracle@mccsdb01 (PPA7)         Disk file operations I/O   8 FileOperation  8 filetype  2020-06-15 14:39 1151 3wybn83gkuc4k oracle@mccsdb01 (PPA7)         Disk file operations I/O   8 FileOperation  8 filetype  2020-06-15 14:39 1151 3wybn83gkuc4k oracle@mccsdb01 (PPA7)         Disk file operations I/O   8 FileOperation  8 filetype  2020-06-15 14:39 1151 3wybn83gkuc4k oracle@mccsdb01 (PPA7)         Disk file operations I/O   8 FileOperation  8 filetype  2020-06-15 14:39 2007               rman@mccsdb01 (TNS V1-V3)      Disk file operations I/O   8 FileOperation  8 filetype  2020-06-15 14:47 2853               rman@mccsdb01 (TNS V1-V3)      Disk file operations I/O   5 FileOperation  1 filetype  2020-06-15 14:50  103               rman@mccsdb01 (TNS V1-V3)      Disk file operations I/O   8 FileOperation  8 filetype  2020-06-15 15:00 1626               rman@mccsdb01 (TNS V1-V3)      Disk file operations I/O   8 FileOperation  8 filetype  2020-06-15 15:02  | 
该SQL的代码如下:
| 
 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23  | 
SELECT TO_CHAR(current_timestamp AT TIME ZONE 'GMT',                'YYYY-MM-DD HH24:MI:SS TZD') AS curr_timestamp,        SUM(failed_count) AS failed_count,        TO_CHAR(MIN(first_occur_time), 'yyyy-mm-dd hh24:mi:ss') AS first_occur_time,        TO_CHAR(MAX(last_occur_time), 'yyyy-mm-dd hh24:mi:ss') AS last_occur_time   FROM (SELECT COUNT(db_user) AS failed_count,                MIN(extended_timestamp) AS first_occur_time,                MAX(extended_timestamp) AS last_occur_time           FROM sys.dba_common_audit_trail          WHERE action BETWEEN 100 AND 102            AND returncode != 0            AND STATEMENT_TYPE = 'LOGON'            AND extended_timestamp >=                current_timestamp - TO_DSINTERVAL('0 0:30:00')         UNION         SELECT COUNT(DBUSERNAME) AS failed_count,                MIN(event_timestamp) AS first_occur_time,                MAX(event_timestamp) AS last_occur_time           FROM unified_audit_trail          WHERE ACTION_NAME = 'LOGON'            and return_code <> 0            AND event_timestamp >=                current_timestamp - TO_DSINTERVAL('0 0:30:00'))  | 
从代码来看,这是一个跟审计相关的SQL语句。第一直觉是这个SQL执行效率可能不高;但是其出现的等待事件是IO相关。因此用户非常关系是不是系统IO有问题?如何判断IO是否有问题呢?其实我们可以通过脚本发现查询相关event的平均IO等待事件时间即可:
| 
 1 2 3 4 5 6 7 8 9 10 11  | 
----and SAMPLE_TIME = to_Date('2020-06-15 15:09','yyyy-mm-dd hh24:mi') ---group byinst_id,event,to_char(SAMPLE_TIME,'yyyy-mm-dd hh24:mi'),p1,p3 order by 1,3,5; BTIME                    EVENT_NAME                        AVG_MS           CT ------------------------ ------------------------------ --------- ------------ 15-JUN-20 14:30          Disk file operations I/O           19.42    1,293,197 15-JUN-20 15:00          Disk file operations I/O             .30           43 15-JUN-20 15:00          Disk file operations I/O           19.53    1,294,049 15-JUN-20 15:00          db file sequential read              .40      668,462 SYS@mccs1>  | 
如果说上述信息还不足以证明的话,那么其实可以查看awr报告的直方图信息:
我们知道Oracle为了确保安全性,实现的是日志写优先原则;每一笔事务的提交,Oracle lgwr进程都必须将对应的日志刷新到磁盘;因此可以通过lgwr 进程的写操作来判断数据库主机到存储之间的链路和存储IO是否正常。很明显,从上述图表来看,lgwr file parallel write没有超过8ms的等待。因此可以排除链路和存储的问题。
最开始由于怀疑是SQL问题,尝试收集了几个表的统计信息之后,运行了一下SQL,都在秒级能够返回结果,以为解决了问题。没想到,第二天监控又爆出来了。仍然是相同的问题告警。
这是我再次回馈到分析这个sql的执行计划,该执行计划如下所示:
| 
 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56  | 
Id 	Operation 	Name 	Rows 	Bytes 	Cost (%CPU)	Time 	Pstart	Pstop 	TQ 	IN-OUT	PQ Distrib  0 	SELECT STATEMENT 	 	 	 	22 (100)	 	 	 	 	 	  1 	   SORT AGGREGATE 	 	1 	57 	 	 	 	 	 	 	  2 	     VIEW 	 	2 	114 	22 (10)	00:00:01 	 	 	 	 	  3 	       SORT UNIQUE 	 	2 	167 	22 (10)	00:00:01 	 	 	 	 	  4 	         UNION-ALL 	 	 	 	 	 	 	 	 	 	  5 	           SORT AGGREGATE 	 	1 	88 	10 (10)	00:00:01 	 	 	 	 	  6 	             VIEW 	DBA_COMMON_AUDIT_TRAIL 	3 	264 	9 (0)	00:00:01 	 	 	 	 	  7 	               UNION-ALL 	 	 	 	 	 	 	 	 	 	  8 	                 NESTED LOOPS OUTER 	 	1 	165 	5 (0)	00:00:01 	 	 	 	 	  9 	                   NESTED LOOPS OUTER 	 	1 	161 	4 (0)	00:00:01 	 	 	 	 	  10 	                     NESTED LOOPS OUTER 	 	1 	156 	3 (0)	00:00:01 	 	 	 	 	  11 	                       NESTED LOOPS 	 	1 	151 	2 (0)	00:00:01 	 	 	 	 	  12 	                         TABLE ACCESS FULL 	AUD$ 	1 	131 	2 (0)	00:00:01 	 	 	 	 	  13 	                         INDEX UNIQUE SCAN 	I_AUDIT_ACTIONS 	1 	20 	0 (0)	 	 	 	 	 	  14 	                       INDEX RANGE SCAN 	I_SYSTEM_PRIVILEGE_MAP 	1 	5 	1 (0)	00:00:01 	 	 	 	 	  15 	                     INDEX RANGE SCAN 	I_SYSTEM_PRIVILEGE_MAP 	1 	5 	1 (0)	00:00:01 	 	 	 	 	  16 	                   INDEX RANGE SCAN 	I_STMT_AUDIT_OPTION_MAP 	1 	4 	1 (0)	00:00:01 	 	 	 	 	  17 	                 NESTED LOOPS OUTER 	 	1 	187 	4 (0)	00:00:01 	 	 	 	 	  18 	                   NESTED LOOPS OUTER 	 	1 	183 	3 (0)	00:00:01 	 	 	 	 	  19 	                     NESTED LOOPS OUTER 	 	1 	178 	2 (0)	00:00:01 	 	 	 	 	  20 	                       FILTER 	 	 	 	 	 	 	 	 	 	  21 	                         NESTED LOOPS OUTER 	 	1 	173 	1 (0)	00:00:01 	 	 	 	 	  22 	                           PX COORDINATOR 	 	 	 	 	 	 	 	 	 	  23 	                             PX SEND QC (RANDOM) 	:TQ10000 	1 	153 	 	 	 	 	Q1,00 	P->S 	QC (RAND)  24 	                               VIEW 	GV$XML_AUDIT_TRAIL 	 	 	 	 	 	 	Q1,00 	PCWP 	  25 	                                 FIXED TABLE FULL 	X$XML_AUDIT_TRAIL 	1 	153 	 	 	 	 	Q1,00 	PCWP 	  26 	                           INDEX RANGE SCAN 	I_AUDIT_ACTIONS 	1 	20 	1 (0)	00:00:01 	 	 	 	 	  27 	                       INDEX RANGE SCAN 	I_SYSTEM_PRIVILEGE_MAP 	1 	5 	1 (0)	00:00:01 	 	 	 	 	  28 	                     INDEX RANGE SCAN 	I_SYSTEM_PRIVILEGE_MAP 	1 	5 	1 (0)	00:00:01 	 	 	 	 	  29 	                   INDEX RANGE SCAN 	I_STMT_AUDIT_OPTION_MAP 	1 	4 	1 (0)	00:00:01 	 	 	 	 	  30 	                 FILTER 	 	 	 	 	 	 	 	 	 	  31 	                   TABLE ACCESS FULL 	FGA_LOG$ 	1 	92 	2 (0)	00:00:01 	 	 	 	 	  32 	           SORT AGGREGATE 	 	1 	79 	12 (9)	00:00:01 	 	 	 	 	  33 	             VIEW 	UNIFIED_AUDIT_TRAIL 	2 	158 	11 (0)	00:00:01 	 	 	 	 	  34 	               UNION-ALL 	 	 	 	 	 	 	 	 	 	  35 	                 NESTED LOOPS OUTER 	 	1 	186 	3 (0)	00:00:01 	 	 	 	 	  36 	                   NESTED LOOPS OUTER 	 	1 	182 	2 (0)	00:00:01 	 	 	 	 	  37 	                     NESTED LOOPS 	 	1 	177 	1 (0)	00:00:01 	 	 	 	 	  38 	                       PX COORDINATOR 	 	 	 	 	 	 	 	 	 	  39 	                         PX SEND QC (RANDOM) 	:TQ20000 	1 	153 	 	 	 	 	Q2,00 	P->S 	QC (RAND)  40 	                           VIEW 	GV$UNIFIED_AUDIT_TRAIL 	 	 	 	 	 	 	Q2,00 	PCWP 	  41 	                             FIXED TABLE FULL 	X$UNIFIED_AUDIT_TRAIL 	1 	153 	 	 	 	 	Q2,00 	PCWP 	  42 	                       TABLE ACCESS BY INDEX ROWID 	ALL_UNIFIED_AUDIT_ACTIONS 	1 	24 	1 (0)	00:00:01 	 	 	 	 	  43 	                         INDEX UNIQUE SCAN 	I_UNIFIED_AUDIT_ACTIONS 	1 	 	0 (0)	 	 	 	 	 	  44 	                     INDEX RANGE SCAN 	I_SYSTEM_PRIVILEGE_MAP 	1 	5 	1 (0)	00:00:01 	 	 	 	 	  45 	                   INDEX RANGE SCAN 	I_STMT_AUDIT_OPTION_MAP 	1 	4 	1 (0)	00:00:01 	 	 	 	 	  46 	                 TABLE ACCESS BY INDEX ROWID BATCHED	ALL_UNIFIED_AUDIT_ACTIONS 	1 	24 	1 (0)	00:00:01 	 	 	 	 	  47 	                   NESTED LOOPS 	 	1 	63 	8 (0)	00:00:01 	 	 	 	 	  48 	                     NESTED LOOPS OUTER 	 	1 	39 	7 (0)	00:00:01 	 	 	 	 	  49 	                       NESTED LOOPS OUTER 	 	1 	35 	6 (0)	00:00:01 	 	 	 	 	  50 	                         PARTITION RANGE ALL 	 	1 	30 	5 (0)	00:00:01 	1 	1048575	 	 	  51 	                           TABLE ACCESS FULL 	AUD$UNIFIED 	1 	30 	5 (0)	00:00:01 	1 	1048575	 	 	  52 	                         INDEX RANGE SCAN 	I_SYSTEM_PRIVILEGE_MAP 	1 	5 	1 (0)	00:00:01 	 	 	 	 	  53 	                       INDEX RANGE SCAN 	I_STMT_AUDIT_OPTION_MAP 	1 	4 	1 (0)	00:00:01 	 	 	 	 	  54 	                     INDEX UNIQUE SCAN 	I_UNIFIED_AUDIT_ACTIONS 	1 	 	0 (0)	 	 	 	 	 	   | 
再次回归到分析SQL执行计划,我们发现该SQL的视图中访问的几个表在之前都进行了检查,数据量非常小,而且执行计划本身并没有问题。唯一的可能疑点是对于X$XML_AUDIT_TRAIL 的访问。在之前的分析中,由于检查数据库审计设置,发现已关闭了数据库审计,因此忽略了这个疑点。
通过进行单次多次测试发现,基于X$XML_AUDIT_TRAIL的访问时非常慢的;而且每次访问都会出现Disk file operation IO等待。如下所示:
| 
 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20  | 
SESSION_ID SQL_ID        PROGRAM                   EVENT                      P1 P1TEXT           P3 P3TEXT     TO_CHAR(SAMPLE_T ---------- ------------- ------------------------- ------------------------- --- ------------------- ---------  ----------------       1151 82x1cyrkcg1jd oracle@mccsdb01 (PPA7)    Disk file operations I/O    8 FileOperation     8 filetype   2020-06-16 12:18       1713 d0k3ahnmyd2a2 sqlplus@mccsdb01 (TNS V1- Disk file operations I/O    8 FileOperation     8 filetype   2020-06-16 12:16                          V3)                                                                                          1713 d0k3ahnmyd2a2 sqlplus@mccsdb01 (TNS V1- Disk file operations I/O    8 FileOperation     8 filetype   2020-06-16 12:16                          V3)                                                                                          1713 d0k3ahnmyd2a2 sqlplus@mccsdb01 (TNS V1- Disk file operations I/O    8 FileOperation     8 filetype   2020-06-16 12:16                          V3)                                                                                          1151 3wybn83gkuc4k oracle@mccsdb01 (PPA7)    Disk file operations I/O    8 FileOperation     8 filetype   2020-06-16 12:07       1151 5g51s485pqu8f oracle@mccsdb01 (PPA7)    Disk file operations I/O    8 FileOperation     8 filetype   2020-06-16 11:49       1151 3wybn83gkuc4k oracle@mccsdb01 (PPA7)    Disk file operations I/O    8 FileOperation     8 filetype   2020-06-16 11:37        479               rman@mccsdb01 (TNS V1-V3) Disk file operations I/O    8 FileOperation     8 filetype   2020-06-16 11:23         10               rman@mccsdb01 (TNS V1-V3) Disk file operations I/O    8 FileOperation     8 filetype   2020-06-16 11:21        479               rman@mccsdb01 (TNS V1-V3) Disk file operations I/O    8 FileOperation     8 filetype   2020-06-16 11:17       1151 3wybn83gkuc4k oracle@mccsdb01 (PPA7)    Disk file operations I/O    8 FileOperation     8 filetype   2020-06-16 11:07       1713               rman@mccsdb01 (TNS V1-V3) Disk file operations I/O    8 FileOperation     8 filetype   2020-06-16 10:45	  | 
而上述的d0k3ahnmyd2a2 则是我的测试SQL语句:
| 
 1 2 3 4 5 6 7 8  | 
SYS@mccs1>select sql_text from v$sql where sql_id='d0k3ahnmyd2a2'; SQL_TEXT ------------------------------------------------------------------------------------ select count(1) from X$XML_AUDIT_TRAIL Elapsed: 00:00:00.01 SYS@mccs1>  | 
从验证来看,看来问题出现这个xml的表上。我们知道Disk file operation IO等待表示的对于文件的操作,比如open,close等。而且前面查询event的operation type都是8,表明确实在等待IO。那么这里的IO是不是物理IO呢?
| 
 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15  | 
Stat Name	Statement Total	Per Execution	% Snap Total Elapsed Time (ms)	343,319	171,659.47	9.77 CPU Time (ms)	1,140	569.81	0.27 Executions	2	1.00	0.00 Buffer Gets	34	17.00	0.00 Disk Reads	0	0.00	0.00 Parse Calls	6	3.00	0.01 Rows	2	1.00	  User I/O Wait Time (ms)	343,298	171,648.94	94.03 Cluster Wait Time (ms)	0	0.00	0.00 Application Wait Time (ms)	0	0.00	0.00 Concurrency Wait Time (ms)	0	0.00	0.00 Invalidations	0	 	  Version Count	4	 	  Sharable Mem(KB)	804  | 
从上面的SQL report来看,其实该SQL并没有产生物理读。都没有产生物理IO,为什么是IO相关的等待呢?这就奇怪了。。。
那么唯一的解释就是该SQL可能会访问系统的文件,因为用到了xml。联想到这一层,那么怎么验证这个猜想呢?这里我分别开启2个crt窗口进行测试跟踪即可:
++Session 1
| 
 1 2 3 4 5 6 7 8 9 10  | 
SYS@mccs1>c/9/8   1* select  count(8) from X$XML_AUDIT_TRAIL SYS@mccs1>/   COUNT(8) ----------          0 Elapsed: 00:00:08.20 SYS@mccs1>  | 
可以看到该SQL查询花费了8.2秒。
+++Session2
| 
 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64  | 
Tue Jun 16 13:06:51 2020 0.0000:        kread(7, "\0\0\01506\0\0\0\0\00305".., 8208) (sleeping...) 0.0000:        kread(7, "\0\0019006\0\0\0\0\011 i".., 8208) = 400 7.3600:        _poll(0x072000000B4E1518, 3, 0)  = 0 7.3611:        statx("/usr/share/lib/zoneinfo/Asia/Shanghai", 0x0FFFFFFFFFFF68F0, 176, 0) = 0 7.3613:        statx("/u01/app/oracle/admin/mccs/adump/adx_mccs1_ora_29360546.txt", 0x0FFFFFFFFFFF3AD0, 176, 010) Err#2  ENOENT 7.3615:        statx("/u01/app/oracle/admin/mccs/adump/adx_mccs1_ora_29360546.txt", 0x0FFFFFFFFFFF3A10, 176, 010) Err#2  ENOENT 7.3616:        statx("/u01/app/oracle/admin/mccs/adump/adx_mccs1_ora_29360546.txt", 0x0FFFFFFFFFFF3440, 176, 010) Err#2  ENOENT 7.3618:        statx("/u01/app/oracle/admin/mccs/adump/", 0x0FFFFFFFFFFF2098, 176, 010) = 0 7.3620:        statfs("/u01/app/oracle/admin/mccs/adump/adx_mccs1_ora_29360546.txt", 0x0FFFFFFFFFFF1F10) Err#2  ENOENT 7.3622:        statfs("/u01/app/oracle/admin/mccs/adump/", 0x0FFFFFFFFFFF1F10) = 0 7.3623:        mntctl(2, 512, "")               = 0 7.3625:        mntctl(2, 1888, "")              = 13 7.3627:        kopen("/etc/vfs", O_RDONLY)      = 11 7.3628:        kioctl(11, 22528, 0x0000000000000000, 0x0000000000000000) Err#25 ENOTTY 7.3630:        kioctl(11, 22528, 0x0000000000000000, 0x0000000000000000) Err#25 ENOTTY 7.3631:        kread(11, " #   @ ( # ) 7 7\t 1 . 3".., 4096) = 1786 7.3633:        close(11)                        = 0 7.3634:        kopen("/u01/app/oracle/admin/mccs/adump/adx_mccs1_ora_29360546.txt", O_WRONLY|O_CREAT|O_TRUNC|O_LARGEFILE, S_IRUSR|S_IWUSR|S_IRGRP|S_IWGRP) = 11 7.3641:        fstatx(11, 0x0FFFFFFFFFFF1F10, 176, 010) = 0 7.3642:        close(11)                        = 0 7.3644:        statx("/u01/app/oracle/admin/mccs/adump/adx_mccs1_ora_29360546.txt", 0x0FFFFFFFFFFF3580, 176, 010) = 0 7.3646:        statx("/u01/app/oracle/admin/mccs/adump/", 0x0FFFFFFFFFFF21D8, 176, 010) = 0 7.3647:        statfs("/u01/app/oracle/admin/mccs/adump/adx_mccs1_ora_29360546.txt", 0x0FFFFFFFFFFF2050) = 0 7.3649:        mntctl(2, 512, "")               = 0 7.3650:        mntctl(2, 1888, "")              = 13 7.3652:        kopen("/etc/vfs", O_RDONLY)      = 11 7.3654:        kioctl(11, 22528, 0x0000000000000000, 0x0000000000000000) Err#25 ENOTTY ...... 7.4088:        close(12)                        = 0 7.4089:        chdir("adump/")                  = 0 7.4090:        statx(".", 0x0FFFFFFFFFFED580, 176, 010) = 0 7.4092:        kopen(".", O_RDONLY|O_LARGEFILE) = 12 7.4093:        getdirent64(12, 0x0000000121328E70, 4096) = 4056 7.4095:        klseek(12, 0, 0, 0x0FFFFFFFFFFED480) = 0 7.4096:        kfcntl(12, F_GETFD, 0x0000000120C4FFD8) = 0 7.4098:        kfcntl(12, F_SETFD, 0x0000000000000001) = 0 7.4099:        getdirent64(12, 0x0000000121328E70, 4096) = 4056 ...... 15.5541:        getdirent64(12, 0x0000000121328E70, 4096) = 4032 15.5542:        getdirent64(12, 0x0000000121328E70, 4096) = 4032 15.5544:        getdirent64(12, 0x0000000121328E70, 4096) = 4032 15.5546:        getdirent64(12, 0x0000000121328E70, 4096) = 4032 15.5547:        getdirent64(12, 0x0000000121328E70, 4096) = 4032 15.5549:        getdirent64(12, 0x0000000121328E70, 4096) = 4032 15.5551:        getdirent64(12, 0x0000000121328E70, 4096) = 4032 15.5552:        getdirent64(12, 0x0000000121328E70, 4096) = 936 15.5554:        getdirent64(12, 0x0000000121328E70, 4096) = 0 15.5555:        close(12)                       = 0 15.5557:        chdir("/u01/app/oracle/admin/mccs") = 0 15.5559:        chdir("/u01/app/oracle/admin")  = 0 15.5560:        chdir("/u01/app/oracle")        = 0 15.5562:        chdir("/u01/app")               = 0 15.5563:        chdir("/u01")                   = 0 15.5564:        chdir("/")                      = 0 15.5566:        chdir("/u01/app/oracle/product/19c/dbhome_1/dbs") = 0 15.5568:        close(11)                       = 0 15.5571:        _poll(0x072000000B4E1518, 3, 0) = 0 15.5572:        statx("/u01/app/oracle/admin/mccs/adump/adx_mccs1_ora_29360546.txt", 0x0FFFFFFFFFFF44A0, 176, 011) = 0 15.5574:        unlink("/u01/app/oracle/admin/mccs/adump/adx_mccs1_ora_29360546.txt") = 0 15.5586:        kwrite(10, "\0\0018506\0\0\0\0\010\0".., 389) = 389 15.5588:        kread(7, "\0\0\01506\0\0\0\0\00305".., 8208) = 21 15.5589:        kwrite(10, "\0\0\0 0\0\0\0\004\0".., 180) = 180 17.5592:        kread(7, "\0\0\01506\0\0\0\0\00305".., 8208) (sleeping...)  | 
从上的跟踪信息来看;我们的测试SQL语句执行时间为8.2秒,其中从_poll函数开始到close(12)操作截止,就消耗了8.19秒。其中getdirent64函数操作就消耗了8.14秒。该函数表现获取文件系统目录和文件信息;同时从close的句柄操作来看,主要是慢在/adump目录的操作上。这也难怪上述SQL执行为什么会这么慢了。而且每次执行必然会出现Disk file operation IO等待。
进一步检查/adump目录发现,该目录下的小文件居然高达263万个。
| 
 1 2  | 
[oracle@xxxxdb01:/u01/app/oracle/admin/xxxx/adump]$ls|wc -l  2634022  | 
到这里问题基本上就非常清晰了。只要清理这个目录的垃圾文件即可。
那么问题就来了,数据库都已经关闭了审计,为什么还会产生这么多文件呢?实际上大概估算一下,该系统从去年安装部署后至今,大概也就200多天;也就是说该目录平均每天增加1万多个小文件。其实这是因为Oracle数据库默认情况下仍然会审计sys的登陆操作。其他业务用户的审计操作均已关闭。
换句话讲,也就是说该系统的sys用户登陆每天高达1w+次。这是什么神操作呢 ?
最好检查crontab发现是用户有一个清理日志的脚本,忘记注释了,而该脚本几乎是每秒登陆一次(实际上大概是2-3秒)。也就就解释了为什么每天产生12000个左右的文件了。




Leave a Reply
You must be logged in to post a comment.