IB驱动问题导致Oracle集群主机重启
本站文章除注明转载外,均为本站原创: 转载自love wife love life —Roger的Oracle/MySQL/PostgreSQL数据恢复博客
本文链接地址: IB驱动问题导致Oracle集群主机重启
某客户分布式存储环境在进行ifdown IB2测试时(Oracle RAC环境有2个心跳网卡;分别是ib0/ib2),发现数据库主机直接crash重启;我们先看看ocssd log:
1 2 3 4 5 6 7 |
2020-04-23 18:43:52.380987 : CSSD:29906688: clssgmpcMemberDataUpdt: grockName HB+ASM memberID 9:2:2, datatype 1 datasize 4 2020-04-23 18:43:52.381176 : CSSD:23856896: clssgmcpDataUpdtCmpl: Status 0 mbr data updt memberID 9:2:2 from clientID 2:96:4 Trace file /u01/app/12.1/diag/crs/mpbdb2/crs/trace/ocssd.trc Oracle Database 12c Clusterware Release 12.1.0.2.0 - Production Copyright 1996, 2014 Oracle. All rights reserved. 2020-04-23 18:53:14.996300 : CSSD:654306816: (TLM) Starting CSS daemon, version 12.1.0.2.0 with uniqueness value 1587639194 2020-04-23 18:53:14.996320 : CSSD:654306816: clsu_load_ENV_levels: Module = CSSD, LogLevel = 2, TraceLevel = 0 2020-04-23 18:53:14.996329 : CSSD:654306816: clsu_load_ENV_levels: Module = CSSDNMC, LogLevel = 2, TraceLevel = 0 |
可以看到18:43:52直接重启了。由于这几套环境我们之前开启了kdump;我将客户的vmcore文件拿到本地进行了简单分析。
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 |
root@localhost tmp]# crash /usr/lib/debug/lib/modules/2.6.32-642.el6.x86_64/vmlinux vmcore crash 7.1.4-1.0.1.el6_7 Copyright (C) 2002-2015 Red Hat, Inc. Copyright (C) 2004, 2005, 2006, 2010 IBM Corporation Copyright (C) 1999-2006 Hewlett-Packard Co Copyright (C) 2005, 2006, 2011, 2012 Fujitsu Limited Copyright (C) 2006, 2007 VA Linux Systems Japan K.K. Copyright (C) 2005, 2011 NEC Corporation Copyright (C) 1999, 2002, 2007 Silicon Graphics, Inc. Copyright (C) 1999, 2000, 2001, 2002 Mission Critical Linux, Inc. This program is free software, covered by the GNU General Public License, and you are welcome to change it and/or distribute copies of it under certain conditions. Enter "help copying" to see the conditions. This program has absolutely no warranty. Enter "help warranty" for details. GNU gdb (GDB) 7.6 Copyright (C) 2013 Free Software Foundation, Inc. License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html> This is free software: you are free to change and redistribute it. There is NO WARRANTY, to the extent permitted by law. Type "show copying" and "show warranty" for details. This GDB was configured as "x86_64-unknown-linux-gnu"... WARNING: kernel version inconsistency between vmlinux and dumpfile KERNEL: /usr/lib/debug/lib/modules/2.6.32-642.el6.x86_64/vmlinux DUMPFILE: vmcore [PARTIAL DUMP] CPUS: 56 DATE: Thu Apr 23 18:43:52 2020 UPTIME: 2 days, 00:31:16 LOAD AVERAGE: 2.44, 2.45, 2.82 TASKS: 2884 NODENAME: mpbdb2 RELEASE: 2.6.32-642.el6.x86_64 VERSION: #1 SMP Wed Apr 13 00:51:26 EDT 2016 MACHINE: x86_64 (2593 Mhz) MEMORY: 255.6 GB PANIC: "kernel BUG at mm/slab.c:524!" PID: 47680 COMMAND: "ip" TASK: ffff881fb58a4ab0 [THREAD_INFO: ffff8810dad2c000] CPU: 26 STATE: TASK_RUNNING (PANIC) crash> files 47680 PID: 47680 TASK: ffff881fb58a4ab0 CPU: 26 COMMAND: "ip" ROOT: / CWD: /etc/sysconfig/network-scripts FD FILE DENTRY INODE TYPE PATH 0 ffff884035ef61c0 ffff881fa4c509c0 ffff881fbbae6108 CHR /dev/pts/0 1 ffff884035ef61c0 ffff881fa4c509c0 ffff881fbbae6108 CHR /dev/pts/0 2 ffff884050350d80 ffff8820535d9e00 ffff884053150a38 CHR /dev/null 3 ffff881983e48d80 ffff8813fc2029c0 ffff88167b81fbc8 SOCK |
下面进一步查看堆栈信息:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 |
crash> bt PID: 47680 TASK: ffff881fb58a4ab0 CPU: 26 COMMAND: "ip" #0 [ffff8810dad2f1f0] machine_kexec at ffffffff8103fdcb #1 [ffff8810dad2f250] crash_kexec at ffffffff810d1fe2 #2 [ffff8810dad2f320] oops_end at ffffffff8154bc40 #3 [ffff8810dad2f350] die at ffffffff8101102b #4 [ffff8810dad2f380] do_trap at ffffffff8154b494 #5 [ffff8810dad2f3e0] do_invalid_op at ffffffff8100cd95 #6 [ffff8810dad2f480] invalid_op at ffffffff8100c01b [exception RIP: kfree+668] +++++ exception RIP即为造成错误的指令 RIP: ffffffff81181b1c RSP: ffff8810dad2f538 RFLAGS: 00010046 RAX: ffffea003bea99f0 RBX: ffff88111e752000 RCX: ffff88111e752000 RDX: 0040000000080000 RSI: 0000000000000046 RDI: ffff88111e752000 RBP: ffff8810dad2f598 R8: 0000000000000001 R9: ffff8800000bda00 R10: 0000000000000000 R11: 0000000000000000 R12: ffffffff8146b528 R13: 0000000000000286 R14: 0000000000000005 R15: 0000000000000001 ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018 #7 [ffff8810dad2f5a0] skb_release_data at ffffffff8146b528 #8 [ffff8810dad2f5c0] __kfree_skb at ffffffff8146b05e #9 [ffff8810dad2f5e0] consume_skb at ffffffff8146b11b #10 [ffff8810dad2f600] dev_kfree_skb_any at ffffffff81478e9d #11 [ffff8810dad2f650] ipoib_ib_dev_stop at ffffffffa060a074 [ib_ipoib] #12 [ffff8810dad2f670] ipoib_stop at ffffffffa0604c75 [ib_ipoib] #13 [ffff8810dad2f6a0] dev_close_many at ffffffff81479f15 #14 [ffff8810dad2f6e0] dev_close at ffffffff8147a471 #15 [ffff8810dad2f710] dev_change_flags at ffffffff814794dc #16 [ffff8810dad2f750] do_setlink at ffffffff81488ca7 #17 [ffff8810dad2f7f0] rtnl_newlink at ffffffff8148a55e #18 [ffff8810dad2fa00] rtnetlink_rcv_msg at ffffffff81489d77 #19 [ffff8810dad2fa70] netlink_rcv_skb at ffffffff814a6389 #20 [ffff8810dad2faa0] rtnetlink_rcv at ffffffff81489e35 #21 [ffff8810dad2fac0] netlink_unicast at ffffffff814a5faf #22 [ffff8810dad2fb20] netlink_sendmsg at ffffffff814a6a13 #23 [ffff8810dad2fbb0] sock_sendmsg at ffffffff814634b3 #24 [ffff8810dad2fd60] __sys_sendmsg at ffffffff81464c96 #25 [ffff8810dad2ff10] sys_sendmsg at ffffffff81464eb9 #26 [ffff8810dad2ff80] system_call_fastpath at ffffffff8100b0d2 RIP: 00000039c9ce9a30 RSP: 00007ffcf739d9c0 RFLAGS: 00010246 RAX: 000000000000002e RBX: ffffffff8100b0d2 RCX: 0000000000000000 RDX: 0000000000000000 RSI: 00007ffcf739d9e0 RDI: 0000000000000003 RBP: 00007ffcf739d9e0 R8: 0000000000000000 R9: 0000000000000000 R10: 000000000063ad90 R11: 0000000000000246 R12: 0000000000000003 R13: 00007ffcf73a6270 R14: 0000000000637900 R15: 0000000000000003 ORIG_RAX: 000000000000002e CS: 0033 SS: 002b |
从上述堆栈来看是执行kfree 回收slab时失败了。我们可以通过crash工具的dis来查看相关报错代码原文件的具体位置:
1 2 3 4 5 6 7 8 |
crash> dis -l ffffffff81181b1c /usr/src/debug/kernel-2.6.32-642.el6/linux-2.6.32-642.el6.x86_64/mm/slab.c: 524 0xffffffff81181b1c <kfree+668>: ud2 crash> crash> dis -l ffffffff8146b528 /usr/src/debug/kernel-2.6.32-642.el6/linux-2.6.32-642.el6.x86_64/net/core/skbuff.c: 424 0xffffffff8146b528 <skb_release_data+216>: pop %rbx crash> |
接着我们来查看上述2个原文件的524行和424行;看看跟我们的分析是否匹配:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 |
511 /* 512 * Functions for storing/retrieving the cachep and or slab from the page 513 * allocator. These are used to find the slab an obj belongs to. With kfree(), 514 * these are used to find the cache which an obj belongs to. 515 */ 516 static inline void page_set_cache(struct page *page, struct kmem_cache *cache) 517 { 518 page->lru.next = (struct list_head *)cache; 519 } 520 521 static inline struct kmem_cache *page_get_cache(struct page *page) 522 { 523 page = compound_head(page); 524 BUG_ON(!PageSlab(page)); 525 return (struct kmem_cache *)page->lru.next; 526 } 527 528 static inline void page_set_slab(struct page *page, struct slab *slab) 529 { 530 page->lru.prev = (struct list_head *)slab; 531 } 532 533 static inline struct slab *page_get_slab(struct page *page) 534 { 535 BUG_ON(!PageSlab(page)); 536 return (struct slab *)page->lru.prev; 537 } |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 |
396 static void skb_release_data(struct sk_buff *skb) 397 { 398 if (!skb->cloned || 399 !atomic_sub_return(skb->nohdr ? (1 << SKB_DATAREF_SHIFT) + 1 : 1, 400 &skb_shinfo(skb)->dataref)) { 401 if (skb_shinfo(skb)->nr_frags) { 402 int i; 403 for (i = 0; i < skb_shinfo(skb)->nr_frags; i++) 404 put_page(skb_shinfo(skb)->frags[i].page); 405 } 406 407 /* 408 * If skb buf is from userspace, we need to notify the caller 409 * the lower device DMA has done; 410 */ 411 if (skb_tx(skb)->dev_zerocopy) { 412 struct ubuf_info *uarg; 413 414 uarg = skb_shinfo(skb)->destructor_arg; 415 if (uarg->callback) 416 uarg->callback(uarg); 417 } 418 419 if (skb_has_frag_list(skb)) 420 skb_drop_fraglist(skb); 421 422 kfree(skb->head); 423 } 424 } |
可以看到skb_release_data函数需要去调用kfree进行释放;进而报错了。从上面分析来看初步怀疑是IB驱动问题导致;如何查看IB相关的源代码呢?首先我们来看下该环境的IB驱动版本:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 |
Apr 20 18:58:54 mpbdb2 kernel: Backport based on mlnx_ofed/mlnx-ofa_kernel-4.0.git b4fdfac Apr 20 18:58:54 mpbdb2 kernel: compat.git: mlnx_ofed/mlnx-ofa_kernel-4.0.git Apr 20 18:58:54 mpbdb2 kernel: mlx4_core: Mellanox ConnectX core driver v4.5-1.0.1 Apr 20 18:58:54 mpbdb2 kernel: mlx4_core: Initializing 0000:03:00.0 Apr 20 18:58:54 mpbdb2 kernel: mlx4_core 0000:03:00.0: PCI INT A -> GSI 34 (level, low) -> IRQ 34 Apr 20 18:58:54 mpbdb2 kernel: mlx4_core: device is working in RoCE mode: Roce V1 Apr 20 18:58:54 mpbdb2 kernel: mlx4_core: UD QP Gid type is: V1 Apr 20 18:58:54 mpbdb2 kernel: mlx4_core 0000:03:00.0: DMFS high rate steer mode is: default performance Apr 20 18:58:54 mpbdb2 kernel: mlx4_core 0000:03:00.0: 63.008 Gb/s available PCIe bandwidth (8 GT/s x8 link) Apr 20 18:58:54 mpbdb2 kernel: mlx4_core: Initializing 0000:04:00.0 Apr 20 18:58:54 mpbdb2 kernel: mlx4_core 0000:04:00.0: PCI INT A -> GSI 40 (level, low) -> IRQ 40 Apr 20 18:58:54 mpbdb2 kernel: mlx4_core: device is working in RoCE mode: Roce V1 Apr 20 18:58:54 mpbdb2 kernel: mlx4_core: UD QP Gid type is: V1 Apr 20 18:58:54 mpbdb2 kernel: mlx4_core 0000:04:00.0: DMFS high rate steer mode is: default performance Apr 20 18:58:54 mpbdb2 kernel: mlx4_core 0000:04:00.0: 63.008 Gb/s available PCIe bandwidth (8 GT/s x8 link) |
从dmesg日志来看是4.5版本。我这里在https://elixir.bootlin.com/linux/latest/source/drivers/net/ethernet/mellanox/mlx4/en_tx.c#L1077 上面可以查看到4.x版本相关函数代码,供参考。
Leave a Reply
You must be logged in to post a comment.