linux 强制free cache 导致数据库实例crash
本站文章除注明转载外,均为本站原创: 转载自love wife love life —Roger的Oracle/MySQL/PostgreSQL数据恢复博客
本文链接地址: linux 强制free cache 导致数据库实例crash
这是1个月之前处理的某个客户的案例,现象大致是某天凌晨某RAC节点实例被重启了,通过如下是alert log我们可以发现RAC集群的节点2实例被强行终止掉了,如下是详细的告警日志信息:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 |
Mon Sep 28 02:00:00 2015 Errors in file /oracle/admin/xxx/bdxxx/xxx2_j000_7604.trc: ORA-12012: error on auto execute of job 171538 ORA-06550: line ORA-06550: line 4, column 3: PLS-00905: object XSQD.E_SP_DL_CRM_TERMINAL_MANAGER is invalid ORA-06550: line 4, column 3: PL/SQL: Statement ignored , column : Mon Sep 28 02:03:18 2015 Errors in file /oracle/admin/xxx/udxxx/xxx2_ora_6810.trc: ORA-00600: internal error code, arguments: [KGHLKREM1], [0x679000020], [], [], [], [], [], [] 。。。。。 Trace dxxxing is performing id=[cdmp_20150928023925] Mon Sep 28 02:39:32 2015 Errors in file /oracle/admin/xxx/bdxxx/xxx2_lmd0_24228.trc: ORA-00600: internal error code, arguments: [KGHLKREM1], [0x679000020], [], [], [], [], [], [] Mon Sep 28 02:39:32 2015 LMD0: terminating instance due to error 482 Mon Sep 28 02:39:33 2015 Shutting down instance (abort) License high water mark = 145 Mon Sep 28 02:39:37 2015 Instance terminated by LMD0, pid = 24228 Mon Sep 28 02:39:38 2015 |
从上面的日志来看,在2:03分就开始报错ORA-00600,一直持续到2:39分,lmd0进程开始报同样的错误;然后接着LMD0进程强行把数据库实例终止掉了。。直接搜索Oracle MOS,看上去有点类似这个bug,不过很容易就可以排除。
Bug 14193240 : LMS SIGNALED ORA-600[KGHLKREM1] DURING BEEHIVE LOAD
从日志看,2:03分就开始报错,然而直到lmd0报错时,实例才被终止掉,也就是说lmd0报错才是问题的关键。那么我们首先来分析下lmd0 进程的trace文件内容,如下所示:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 |
*** 2015-09-28 02:39:24.291 ***** Internal heap ERROR KGHLKREM1 addr=0x679000020 ds=0x60000058 ***** ***** Dxxx of memory around addr 0x679000020: 678FFF020 00000000 00000000 00000000 00000000 [................] Repeat 255 times 679000020 60001990 00000000 00000000 00000000 [...`............] 679000030 00000000 00000000 00000000 00000000 [................] Repeat 254 times Recovery state: ds=0x60000058 rtn=(nil) *rtn=(nil) szo=0 u4o=0 hdo=0 off=0 Szo: UB4o: Hdo: Off: Hla: 0 ****************************************************** HEAP Dxxx heap name="sga heap" desc=0x60000058 extent sz=0x47c0 alt=216 het=32767 rec=9 flg=-126 opc=4 parent=(nil) owner=(nil) nex=(nil) xsz=0x0 ds for latch 1: 0x60042f70 0x600447c8 0x60046020 0x60047878 。。。。。 ksedmp: internal or fatal error ORA-00600: internal error code, arguments: [KGHLKREM1], [0x679000020], [], [], [], [], [], [] ----- Call Stack Trace ----- calling call entry argument values in hex location type point (? means dubious value) -------------------- -------- -------------------- ---------------------------- ksedst()+31 call ksedst1() 000000000 ? 000000001 ? 7FFF41D71F90 ? 7FFF41D71FF0 ? 7FFF41D71F30 ? 000000000 ? ksedmp()+610 call ksedst() 000000000 ? 000000001 ? 7FFF41D71F90 ? 7FFF41D71FF0 ? 7FFF41D71F30 ? 000000000 ? ksfdmp()+21 call ksedmp() 000000003 ? 000000001 ? 7FFF41D71F90 ? 7FFF41D71FF0 ? 7FFF41D71F30 ? 000000000 ? kgerinv()+161 call ksfdmp() 000000003 ? 000000001 ? 7FFF41D71F90 ? 7FFF41D71FF0 ? 7FFF41D71F30 ? 000000000 ? kgesinv()+33 call kgerinv() 0068966E0 ? 2AF92650E2C0 ? 7FFF41D71FF0 ? 7FFF41D71F30 ? 000000000 ? 000000000 ? kgesin()+143 call kgesinv() 0068966E0 ? 2AF92650E2C0 ? 7FFF41D71FF0 ? 7FFF41D71F30 ? 000000000 ? 000000000 ? kghnerror()+342 call kgesin() 0068966E0 ? 2AF92650E2C0 ? 7FFF41D71FF0 ? 7FFF41D71F30 ? 000000002 ? 679000020 ? kghadd_reserved_ext call kghnerror() 0068966E0 ? 060000058 ? ent()+1039 005AE1C14 ? 679000020 ? 000000002 ? 679000020 ? kghget_reserved_ext call kghadd_reserved_ext 0068966E0 ? 060000058 ? ent()+250 ent() 060042F70 ? 060042FB8 ? 000000000 ? 000000000 ? kghgex()+1622 call kghget_reserved_ext 0068966E0 ? 060003B98 ? ent() 060042F70 ? 000000B10 ? 000000000 ? 000000000 ? kghfnd()+660 call kghgex() 0068966E0 ? 060003B98 ? 060042F70 ? 000002000 ? 000000AC8 ? 06000D600 ? kghprmalo()+274 call kghfnd() 0068966E0 ? 060003B98 ? 060042F70 ? 000000AB8 ? 000000AB8 ? 06000D600 ? kghalo()+3701 call kghprmalo() 0068966E0 ? 060003B98 ? 060042F70 ? 000000A98 ? 000000A98 ? 7FFF41D73128 ? ksmdacnk()+297 call kghalo() 000000000 ? 060042F70 ? 000000A98 ? 07FFFFFFF ? 000000A98 ? 000000A98 ? ksmdget()+958 call ksmdacnk() 7DF4E4AA0 ? 7ADFF43F0 ? 00533A060 ? 7D5278548 ? 2AF900000000 ? 000000A98 ? kssadpm()+366 call ksmdget() 7DF4E4AA0 ? 000000000 ? 00533A060 ? 7D5278518 ? 7D5278548 ? 000000A98 ? kjlalc()+900 call kssadpm() 000000000 ? 000000000 ? 7D647DB78 ? 000000000 ? 7D54DF5B8 ? 000000A98 ? kjxnrl()+110 call kjlalc() 7D5507080 ? 000000020 ? 7D54DF480 ? 000000000 ? 7D54DF5B8 ? 000000A98 ? kjxocdr()+301 call kjxnrl() 000000060 ? 000000020 ? 2AF926A352B0 ? 7FFF41D736C0 ? 7FFF41D736C8 ? 7FFF41D736D0 ? kjmxmpm()+508 call kjxocdr() 2AF926A35280 ? 000000020 ? 2AF926A352B0 ? 7FFF41D736C0 ? 7FFF41D736C8 ? 7FFF41D736D0 ? kjmpmsgi()+1609 call kjmxmpm() 2AF926A35280 ? 000000000 ? 06CF6F439 ? 000000000 ? 7FFF41D736C8 ? 7FFF41D736D0 ? kjmdm()+7458 call kjmpmsgi() 2AF926A35280 ? 000000001 ? 000000001 ? 000000000 ? 000000000 ? 000000000 ? ksbrdp()+794 call kjmdm() 06000D708 ? 000000001 ? 000000001 ? 7D380FCF0 ? 000000001 ? 7D380FD00 ? opirip()+616 call ksbrdp() 06000D708 ? 000000001 ? 000000001 ? 06000D708 ? 000000001 ? 7D380FD00 ? opidrv()+582 call opirip() 000000032 ? 000000004 ? 7FFF41D74F08 ? 06000D708 ? 000000001 ? 7D380FD00 ? sou2o()+114 call opidrv() 000000032 ? 000000004 ? 7FFF41D74F08 ? 06000D708 ? 000000001 ? 7D380FD00 ? opimai_real()+317 call sou2o() 7FFF41D74EE0 ? 000000032 ? 000000004 ? 7FFF41D74F08 ? 000000001 ? 7D380FD00 ? main()+116 call opimai_real() 000000003 ? 7FFF41D74F70 ? 000000004 ? 7FFF41D74F08 ? 000000001 ? 7D380FD00 ? __libc_start_main() call main() 000000003 ? 7FFF41D74F70 ? +244 000000004 ? 7FFF41D74F08 ? 000000001 ? 7D380FD00 ? _start()+41 call __libc_start_main() 000723078 ? 000000001 ? 7FFF41D750C8 ? 000000000 ? 000000001 ? 000000003 ? |
从上面的信息来看,确实是heap 存在错误的情况。根据oracle mos文档ORA-600 [KGHLKREM1] On Linux Using Parameter drop_cache On hugepages Configuration (1070812.1) 的描述来看,此次故障跟文档描述基本上一致,如下:
1 2 3 4 5 6 7 |
***** Internal heap ERROR KGHLKREM1 addr=0x679000020 ds=0x60000058 ***** ***** Dxxx of memory around addr 0x679000020: 678FFF020 00000000 00000000 00000000 00000000 [................] Repeat 255 times 679000020 60001990 00000000 00000000 00000000 [...`............] 679000030 00000000 00000000 00000000 00000000 [................] Repeat 254 times |
其中地址[0x679000020] 后面的内容也均为0,跟文档描述一样,其次,文章中提到使用了linux 内存释放机制以及同时启用了hugepage配置。
根据文档描述,这应该是Linux bug。通过检查对比2个节点配置,发现节点2的配置确实不同:
1 2 3 4 5 6 7 8 9 |
--节点1 [oracle@xxx-DS01 ~]$ cat /proc/sys/vm/drop_caches 0 [oracle@xxx-DS01 ~]$ --节点2 [oracle@xxx-DS02 ~]$ [oracle@xxx-DS02 ~]$ cat /proc/sys/vm/drop_caches 3 |
当drop_caches 设置为3,会触发linux的内存清理回收机制,可能出现内存错误的情况;然而我们检查配置发现并没有修改:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 |
oracle@xxx-DS02 bdxxx]$ cat /etc/sysctl.conf # Kernel sysctl configuration file for Red Hat Linux # # For binary values, 0 is disabled, 1 is enabled. See sysctl(8) and # sysctl.conf(5) for more details. # Controls IP packet forwarding net.ipv4.ip_forward = 0 # Controls source route verification net.ipv4.conf.default.rp_filter = 1 # Do not accept source routing net.ipv4.conf.default.accept_source_route = 0 # Controls the System Request debugging functionality of the kernel kernel.sysrq = 0 # Controls whether core dxxxs will append the PID to the core filename # Useful for debugging multi-threaded applications kernel.core_uses_pid = 1 # Controls the use of TCP syncookies net.ipv4.tcp_syncookies = 1 # Controls the maximum size of a message, in bytes kernel.msgmnb = 65536 # Controls the default maxmimum size of a mesage queue kernel.msgmax = 65536 # Controls the maximum shared segment size, in bytes kernel.shmmax = 68719476736 # Controls the maximum number of shared memory segments, in pages kernel.shmall = 4294967296 kernel.shmmni = 4096 kernel.sem = 250 32000 100 128 fs.file-max = 65536 net.ipv4.ip_local_port_range = 1024 65000 net.core.rmem_default = 1048576 net.core.rmem_max = 1048576 net.core.wmem_default = 262144 net.core.wmem_max = 262144 vm.nr_hugepages=15800 |
因此,我认为是之前人为进行了echo 3 > /proc/sys/vm/drop_caches 操作来强制释放内存导致。 通过分析发现只能查看到最近几分钟的操作记录,如下:
1 2 3 4 5 6 7 8 9 |
[root@xxx-DS02 ~]# history|grep echo 22 2015-09-29 16:12:42 root echo 3 > /proc/sys/vm/drop_caches 71 2015-09-29 16:12:42 root echo 0 > /proc/sys/vm/drop_caches 73 2015-09-29 16:12:42 root echo 3 > /proc/sys/vm/drop_caches 79 2015-09-29 16:12:42 root echo 0 > /proc/sys/vm/drop_caches 311 2015-09-29 16:12:42 root echo 3 > /proc/sys/vm/drop_caches 329 2015-09-29 16:12:42 root echo 0 > /proc/sys/vm/drop_caches 1001 2015-09-29 16:12:49 root history|grep echo 1005 2015-09-29 16:14:55 root history|grep echo |
看操作记录确实发现了操作,那么同时我检查操作系统日志也发现了一些蛛丝马迹,如下:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 |
Sep 29 00:00:12 xxx-DS02 kernel: <span style="color: #ff0000;">BUG: soft lockup - CPU#1 stuck for 10s! [rel_mem.sh:13887</span>] Sep 29 00:00:12 xxx-DS02 kernel: CPU 1: Sep 29 00:00:12 xxx-DS02 kernel: Modules linked in: hangcheck_timer autofs4 hidp ocfs2(U) rfcomm l2cap bluetooth ocfs2_dlmfs(U) ocfs2_dlm(U) ocfs2_nodemanager(U) configfs lockd sunrpc bonding dm_round_robin dm_multipath scsi_dh video backlight sbs power_meter hwmon i2c_ec i2c_core dell_wmi wmi button battery asus_acpi acpi_memhotplug ac ipv6 xfrm_nalgo crypto_api parport_pc lp parport joydev sg pcspkr bnx2(U) dm_raid45 dm_message dm_region_hash dm_mem_cache dm_snapshot dm_zero dm_mirror dm_log dm_mod usb_storage lpfc scsi_transport_fc shpchp mpt2sas scsi_transport_sas sd_mod scsi_mod ext3 jbd uhci_hcd ohci_hcd ehci_hcd Sep 29 00:00:12 xxx-DS02 kernel: Pid: 13887, comm: rel_mem.sh Tainted: G 2.6.18-194.el5 #1 Sep 29 00:00:12 xxx-DS02 kernel: RIP: 0010:[<ffffffff800cb229>] [<ffffffff800cb229>] __invalidate_mapping_pages+0xf3/0x183 Sep 29 00:00:12 xxx-DS02 kernel: RSP: 0018:ffff8112f5f71da8 EFLAGS: 00000207 Sep 29 00:00:12 xxx-DS02 kernel: RAX: 00000000ffffff94 RBX: 000000000000bc83 RCX: 0000000000000024 Sep 29 00:00:12 xxx-DS02 kernel: RDX: ffff81088008746c RSI: 0000000000000002 RDI: ffff81108c8e2df8 Sep 29 00:00:12 xxx-DS02 kernel: RBP: 00000000000000d0 R08: ffff810880087503 R09: ffff811080001600 Sep 29 00:00:12 xxx-DS02 kernel: R10: 000000000000bc83 R11: ffffffff8858ea45 R12: ffff81108c8e2df8 Sep 29 00:00:12 xxx-DS02 kernel: R13: 0000000000000001 R14: ffff811e35bad6b0 R15: ffff811411a641f0 Sep 29 00:00:12 xxx-DS02 kernel: FS: 00002af83adf5dd0(0000) GS:ffff8108800877c0(0000) knlGS:0000000000000000 Sep 29 00:00:12 xxx-DS02 kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b Sep 29 00:00:12 xxx-DS02 kernel: CR2: 00002ad3e3ae1f90 CR3: 000000120a831000 CR4: 00000000000006e0 Sep 29 00:00:12 xxx-DS02 kernel: Sep 29 00:00:12 xxx-DS02 kernel: Call Trace: Sep 29 00:00:12 xxx-DS02 kernel: [<ffffffff800cb21a>] __invalidate_mapping_pages+0xe4/0x183 Sep 29 00:00:12 xxx-DS02 kernel: [<ffffffff800f642e>] drop_pagecache+0xa5/0x13b Sep 29 00:00:12 xxx-DS02 kernel: [<ffffffff80096bc1>] do_proc_dointvec_minmax_conv+0x0/0x56 Sep 29 00:00:12 xxx-DS02 kernel: [<ffffffff800f64de>] drop_caches_sysctl_handler+0x1a/0x2c Sep 29 00:00:12 xxx-DS02 kernel: [<ffffffff80096fea>] do_rw_proc+0xcb/0x126 Sep 29 00:00:12 xxx-DS02 kernel: [<ffffffff80016a49>] vfs_write+0xce/0x174 Sep 29 00:00:12 xxx-DS02 kernel: [<ffffffff80017316>] sys_write+0x45/0x6e Sep 29 00:00:12 xxx-DS02 kernel: [<ffffffff8005e28d>] tracesys+0xd5/0xe0 |
可以看到也确实出现了drop_cache的相关操作。大家注意看上面红色的地方,提到了是执行了一个shell脚本,然后还导致一共cpu stuck了,而且也能看出该脚本是在执行回收cache的动作。
我坚持认为客户环境上肯定进行了强制的内存回收,但是客户说他们没有进行任何人为操作,不过经过我检查发现确实有一个crontab脚本。
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 |
[root@xxx-DS02 ~]# crontab -l 00 00 * * * /home/oracle/ht/rel_mem.sh [root@xxx-DS02 ~]# cat /home/oracle/ht//rel_mem.sh #!/bin/bash #mkdir /var/log/freemem time=`date +%Y%m%d` used=`free -m | awk 'NR==2' | awk '{print $3}'` free=`free -m | awk 'NR==2' | awk '{print $4}'` echo "===========================" >> /var/log/freemem/mem$time.log date >> /var/log/freemem/mem$time.log echo "Memory usage | [Use:${used}MB][Free:${free}MB]" >> /var/log/freemem/mem$time.log if [ $free -le 71680 ];then sync && echo 3 > /proc/sys/vm/drop_caches echo "OK" >> /var/log/freemem/mem$time.log free >> /var/log/freemem/mem$time.log else echo "Not required" >> /var/log/freemem/mem$time.log fi |
那么为什么主机上会部署这样的脚本呢? 我猜想肯定是操作系统的内存使用率看起来很高,通过检查发现确实如此:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 |
<pre class="brush:plain">[root@xxx-DS02 ~]# cat /proc/meminfo MemTotal: 132072032 kB MemFree: 1188372 kB Buffers: 251144 kB Cached: 90677948 kB ---90677948/1024=88552m SwapCached: 33476 kB Active: 34427132 kB Inactive: 59309060 kB HighTotal: 0 kB HighFree: 0 kB LowTotal: 132072032 kB LowFree: 1188372 kB SwapTotal: 16383992 kB SwapFree: 16184800 kB Dirty: 32 kB Writeback: 64 kB AnonPages: 2774776 kB Mapped: 30703840 kB Slab: 604080 kB PageTables: 3969892 kB NFS_Unstable: 0 kB Bounce: 0 kB CommitLimit: 66240808 kB Committed_AS: 39177448 kB VmallocTotal: 34359738367 kB VmallocUsed: 273448 kB VmallocChunk: 34359464907 kB HugePages_Total: 15800 HugePages_Free: 15800 HugePages_Rsvd: 5424 Hugepagesize: 2048 kB [root@xxx-DS02 ~]# free -m total used free shared buffers cached Mem: 128976 127808 1168 0 245 88552 -/+ buffers/cache: 39010 89966 Swap: 15999 194 15805 |
我们可以看到128G的物理内存,cache 就占据了88G的样子目前。linux 文件系统的cache分为2种:page cache和 buffer cache,page cache是用于文件,inode等操作的cache,而buffer cache是用于块设备的操作。从上面的数据来看,我们所看到的free -m 命令中的cached 88552 全是page cache。而实际上该数据库实例的内存分配一共也就40G,且使用的是linux raw。
1 2 3 4 5 6 7 8 9 10 11 12 13 |
SQL> show parameter sga NAME TYPE VALUE ------------------------------------ ----------- ------------------------------ lock_sga boolean FALSE pre_page_sga boolean FALSE sga_max_size big integer 30G sga_target big integer 30G SQL> show parameter pga NAME TYPE VALUE ------------------------------------ ----------- ------------------------------ pga_aggregate_target big integer 10G |
我们可以看到,整个主机物理内存为128G,而Oracle SGA+pga 才40g,另外将近90G的内存都是fs cache所消耗。完全可以调整linux的参数去释放cache,而不需要使用echo 这种比较暴力的方式;根据Oracle mos的几个文档的描述,推荐设置如下几个参数:
sysctl -w vm.min_free_kbytes=4096000
sysctl -w vm.vfs_cache_pressure=200
sysctl -w vm.swappiness=40 (老版本的linux是设置vm.pagecache参数)
关于linux cache的一些知识请参考:
http://www.ibm.com/developerworks/cn/linux/l-cache/
File System’s Buffer Cache versus Direct I/O (文档 ID 462072.1)
One Response to “linux 强制free cache 导致数据库实例crash”
[…] 从上面的日志来看,在2:03分就开始报错ORA-00600,一直持续到2:39分,lmd0进程开始报同样的错误;然后接着LMD0进程强行把数据库实例终止掉了。。直接搜索Oracle MOS,看上去有点类似这个bug,不过很容易就可以排除。 Bug 14193240 : LMS SIGNALED ORA-600[KGHLKREM1] DURING BEEHIVE LOAD 从日志看,2:03分就开始报错,然而直到lmd0报错时,实例才被终止掉,也就是说lmd0报错才是问题的关键。那么我们首先来分析下lmd0 进程的trace文件内容,如下所示: ? […]
Leave a Reply
You must be logged in to post a comment.