oracle 10gR2 rac for Linux–心跳网卡冗余配置和测试
本站文章除注明转载外,均为本站原创: 转载自love wife love life —Roger的Oracle/MySQL/PostgreSQL数据恢复博客
今天群中一网友在问linux rac心跳网卡冗余的问题,我这里用自己的vm环境模拟下,如下是通过vm 10gR2 rac环境,
心跳网卡冗余的配置和测试。仅供大家参考!
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 |
----停掉crs资源 步骤略 ----修改ip文件 rac1: [root@rac1 network-scripts]# pwd /etc/sysconfig/network-scripts [root@rac1 network-scripts]# cp ifcfg-eth1 ifcfg-bond0 [root@rac1 network-scripts]# cat ifcfg-bond0 # Advanced Micro Devices [AMD] 79c970 [PCnet32 LANCE] DEVICE=bond0 BOOTPROTO=static ONBOOT=yes IPADDR=192.168.73.10 --心跳ip NETWORK=192.168.73.0 BROADCAST=192.168.73.255 NETMASK=255.255.255.0 USERCTL=no BONDING_MASTER=yes TYPE=Ethernet rac2: [root@rac2 network-scripts]# pwd /etc/sysconfig/network-scripts [root@rac2 network-scripts]# cp ifcfg-eth1 ifcfg-bond0 [root@rac2 network-scripts]# cat ifcfg-bond0 # Advanced Micro Devices [AMD] 79c970 [PCnet32 LANCE] DEVICE=bond0 BOOTPROTO=static ONBOOT=yes IPADDR=192.168.73.11 NETWORK=192.168.73.0 BROADCAST=192.168.73.255 NETMASK=255.255.255.0 USERCTL=no BONDING_MASTER=yes TYPE=Ethernet |
—修改网卡设置文件,修改为如下内容(2个节点都需要为如下内容):
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 |
[root@rac1 devices]# pwd /etc/sysconfig/networking/devices [root@rac1 devices]# [root@rac1 devices]# cat ifcfg-eth1 # Advanced Micro Devices [AMD] 79c970 [PCnet32 LANCE] DEVICE=eth1 USERCTL=no ONBOOT=yes MASTER=bond0 SLAVE=yes BOOTPROTO=none TYPE=ethernet [root@rac1 devices]# cat ifcfg-eth2 # Advanced Micro Devices [AMD] 79c970 [PCnet32 LANCE] DEVICE=eth2 USERCTL=no ONBOOT=yes MASTER=bond0 SLAVE=yes BOOTPROTO=none TYPE=ethernet ----将bond0 信息加入到/etc/modprobe.conf文件中(2个节点都需要添加): [root@rac1 devices]# cat /etc/modprobe.conf alias scsi_hostadapter mptbase alias scsi_hostadapter1 mptspi alias scsi_hostadapter2 ata_piix alias snd-card-0 snd-ens1371 options snd-card-0 index=0 options snd-ens1371 index=0 remove snd-ens1371 { /usr/sbin/alsactl store 0 >/dev/null 2>&1 || : ; }; /sbin/modprobe -r --ignore-remove snd-ens1371 # Added by VMware Tools install pciehp /sbin/modprobe -q --ignore-install acpiphp; /bin/true install pcnet32 (/sbin/modprobe -q --ignore-install vmxnet || /sbin/modprobe -q --ignore-install pcnet32 $CMDLINE_OPTS);/bin/true alias eth0 vmxnet alias eth1 vmxnet ###add by Roger alias bond0 bonding options bond0 mode=1 miimon=100 downdelay=200 primary=eth1 primary_reselect=1 [root@rac1 devices]# |
这里说明一下,网卡绑定后的工作模式有2种,0描述双活即active/active,也就是负载均衡模式,相当于是2个网卡同时使用。
mode属性值为1描述active/standby模式,即主备模式,换句话讲,eth1网卡故障后,eth2可以立即替换上,几乎不会影响rac。
—-分别执行如下命令(2个节点都要执行):
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 |
[root@rac1 devices]# modprobe bonding [root@rac1 devices]# ---check network [root@rac1 devices]# ifconfig bond0 Link encap:Ethernet HWaddr 00:0C:29:A7:65:F8 inet addr:192.168.73.10 Bcast:192.168.73.255 Mask:255.255.255.0 inet6 addr: fe80::20c:29ff:fea7:65f8/64 Scope:Link UP BROADCAST RUNNING MASTER MULTICAST MTU:1500 Metric:1 RX packets:248296 errors:0 dropped:0 overruns:0 frame:0 TX packets:166368 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:0 RX bytes:198712779 (189.5 MiB) TX bytes:83112558 (79.2 MiB) eth0 Link encap:Ethernet HWaddr 00:0C:29:A7:65:EE inet addr:192.168.0.128 Bcast:192.168.0.255 Mask:255.255.255.0 inet6 addr: fe80::20c:29ff:fea7:65ee/64 Scope:Link UP BROADCAST RUNNING MULTICAST MTU:1460 Metric:1 RX packets:8910 errors:0 dropped:0 overruns:0 frame:0 TX packets:6416 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:1000 RX bytes:961007 (938.4 KiB) TX bytes:897232 (876.2 KiB) Interrupt:75 Base address:0x2424 eth1 Link encap:Ethernet HWaddr 00:0C:29:A7:65:F8 UP BROADCAST RUNNING SLAVE MULTICAST MTU:1500 Metric:1 RX packets:248290 errors:0 dropped:0 overruns:0 frame:0 TX packets:166368 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:1000 RX bytes:198712419 (189.5 MiB) TX bytes:83112558 (79.2 MiB) Interrupt:67 Base address:0x24a4 eth2 Link encap:Ethernet HWaddr 00:0C:29:A7:65:F8 UP BROADCAST RUNNING SLAVE MULTICAST MTU:1500 Metric:1 RX packets:6 errors:0 dropped:0 overruns:0 frame:0 TX packets:0 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:1000 RX bytes:360 (360.0 b) TX bytes:0 (0.0 b) Interrupt:59 Base address:0x28a4 lo Link encap:Local Loopback inet addr:127.0.0.1 Mask:255.0.0.0 inet6 addr: ::1/128 Scope:Host UP LOOPBACK RUNNING MTU:16436 Metric:1 RX packets:43711 errors:0 dropped:0 overruns:0 frame:0 TX packets:43711 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:0 RX bytes:9908534 (9.4 MiB) TX bytes:9908534 (9.4 MiB) [root@rac2 ~]# ifconfig bond0 Link encap:Ethernet HWaddr 00:0C:29:68:6B:52 inet addr:192.168.73.11 Bcast:192.168.73.255 Mask:255.255.255.0 inet6 addr: fe80::20c:29ff:fe68:6b52/64 Scope:Link UP BROADCAST RUNNING MASTER MULTICAST MTU:1500 Metric:1 RX packets:25 errors:0 dropped:0 overruns:0 frame:0 TX packets:55 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:0 RX bytes:4853 (4.7 KiB) TX bytes:7044 (6.8 KiB) eth0 Link encap:Ethernet HWaddr 00:0C:29:68:6B:48 inet addr:192.168.0.129 Bcast:192.168.0.255 Mask:255.255.255.0 inet6 addr: fe80::20c:29ff:fe68:6b48/64 Scope:Link UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 RX packets:68 errors:0 dropped:0 overruns:0 frame:0 TX packets:93 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:1000 RX bytes:6205 (6.0 KiB) TX bytes:12528 (12.2 KiB) Interrupt:75 Base address:0x2424 eth1 Link encap:Ethernet HWaddr 00:0C:29:68:6B:52 UP BROADCAST RUNNING SLAVE MULTICAST MTU:1500 Metric:1 RX packets:25 errors:0 dropped:0 overruns:0 frame:0 TX packets:55 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:1000 RX bytes:4853 (4.7 KiB) TX bytes:7044 (6.8 KiB) Interrupt:67 Base address:0x24a4 eth2 Link encap:Ethernet HWaddr 00:0C:29:68:6B:5C inet6 addr: fe80::20c:29ff:fe68:6b5c/64 Scope:Link UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 RX packets:20 errors:0 dropped:0 overruns:0 frame:0 TX packets:30 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:1000 RX bytes:3425 (3.3 KiB) TX bytes:5321 (5.1 KiB) Interrupt:59 Base address:0x2824 lo Link encap:Local Loopback inet addr:127.0.0.1 Mask:255.0.0.0 inet6 addr: ::1/128 Scope:Host UP LOOPBACK RUNNING MTU:16436 Metric:1 RX packets:3517 errors:0 dropped:0 overruns:0 frame:0 TX packets:3517 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:0 RX bytes:4765745 (4.5 MiB) TX bytes:4765745 (4.5 MiB) |
—-修复oracle cluster配置
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 |
---rac1 [root@rac1 bin]# cd /home/oracle/app/oracle/product/10.2.0/crs/bin [root@rac1 bin]# ./crsctl start crs Attempting to start CRS stack The CRS stack will be started shortly [root@rac1 bin]# ./oifcfg iflist eth0 192.168.0.0 bond0 192.168.73.0 [root@rac1 bin]# ./oifcfg delif [root@rac1 bin]# [root@rac1 bin]# ./oifcfg setif -global eth0/192.168.0.0:public [root@rac1 bin]# [root@rac1 bin]# ./oifcfg setif -global bond0/192.168.73.0:cluster_interconnect [root@rac1 bin]# ---rac2 [root@rac2 bin]# ./oifcfg delif [root@rac2 bin]# [root@rac2 bin]# ./oifcfg setif -global eth0/192.168.0.0:public PRIF-50: duplicate interface is given in the input [root@rac2 network-scripts]# service network start Bringing up loopback interface: [ OK ] Bringing up interface bond0: [ OK ] Bringing up interface eth0: [ OK ] Bringing up interface eth2: Determining IP information for eth2... failed. [FAILED] |
删除复制到网卡文件,然后重启下network服务,再次运行oifcfg设置即可,如下:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 |
[root@rac2 bin]# ./oifcfg iflist eth0 192.168.0.0 bond0 192.168.73.0 [root@rac2 bin]# ./oifcfg delif [root@rac2 bin]# ./oifcfg setif -global eth0/192.168.0.0:public [root@rac2 bin]# ./oifcfg setif -global bond0/192.168.73.0:cluster_interconnect ---启动crs资源 [root@rac1 bin]# ./crs_stat -t -v Name Type R/RA F/FT Target State Host ---------------------------------------------------------------------- ora....SM1.asm application 0/5 0/0 ONLINE ONLINE rac1 ora....C1.lsnr application 0/5 0/0 ONLINE ONLINE rac1 ora.rac1.gsd application 0/5 0/0 ONLINE ONLINE rac1 ora.rac1.ons application 0/3 0/0 ONLINE ONLINE rac1 ora.rac1.vip application 0/0 0/0 ONLINE ONLINE rac1 ora....SM2.asm application 0/5 0/0 ONLINE ONLINE rac2 ora....C2.lsnr application 0/5 0/0 ONLINE ONLINE rac2 ora.rac2.gsd application 0/5 0/0 ONLINE ONLINE rac2 ora.rac2.ons application 0/3 0/0 ONLINE ONLINE rac2 ora.rac2.vip application 0/0 0/0 ONLINE ONLINE rac2 ora.roger.db application 0/0 0/1 ONLINE ONLINE rac1 ora....lldb.cs application 0/0 0/1 ONLINE ONLINE rac1 ora....er1.srv application 0/0 0/0 ONLINE ONLINE rac1 ora....er2.srv application 0/0 0/0 ONLINE ONLINE rac2 ora....r1.inst application 0/5 0/0 ONLINE ONLINE rac1 ora....r2.inst application 0/5 0/0 ONLINE ONLINE rac2 |
最后来简单测试下:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 |
[root@rac1 bin]# ifconfig eth1 down [root@rac1 bin]# ifconfig bond0 Link encap:Ethernet HWaddr 00:0C:29:A7:65:F8 inet addr:192.168.73.10 Bcast:192.168.73.255 Mask:255.255.255.0 inet6 addr: fe80::20c:29ff:fea7:65f8/64 Scope:Link UP BROADCAST RUNNING MASTER MULTICAST MTU:1500 Metric:1 RX packets:320916 errors:0 dropped:0 overruns:0 frame:0 TX packets:212511 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:0 RX bytes:258401205 (246.4 MiB) TX bytes:102844897 (98.0 MiB) eth0 Link encap:Ethernet HWaddr 00:0C:29:A7:65:EE inet addr:192.168.0.128 Bcast:192.168.0.255 Mask:255.255.255.0 inet6 addr: fe80::20c:29ff:fea7:65ee/64 Scope:Link UP BROADCAST RUNNING MULTICAST MTU:1460 Metric:1 RX packets:10976 errors:0 dropped:0 overruns:0 frame:0 TX packets:8114 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:1000 RX bytes:1239956 (1.1 MiB) TX bytes:1140393 (1.0 MiB) Interrupt:75 Base address:0x2424 eth0:1 Link encap:Ethernet HWaddr 00:0C:29:A7:65:EE inet addr:192.168.0.130 Bcast:192.168.0.255 Mask:255.255.255.0 UP BROADCAST RUNNING MULTICAST MTU:1460 Metric:1 Interrupt:75 Base address:0x2424 eth2 Link encap:Ethernet HWaddr 00:0C:29:A7:65:F8 UP BROADCAST RUNNING SLAVE MULTICAST MTU:1500 Metric:1 RX packets:63 errors:0 dropped:0 overruns:0 frame:0 TX packets:0 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:1000 RX bytes:13911 (13.5 KiB) TX bytes:0 (0.0 b) Interrupt:59 Base address:0x28a4 lo Link encap:Local Loopback inet addr:127.0.0.1 Mask:255.0.0.0 inet6 addr: ::1/128 Scope:Host UP LOOPBACK RUNNING MTU:16436 Metric:1 RX packets:55766 errors:0 dropped:0 overruns:0 frame:0 TX packets:55766 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:0 RX bytes:12093653 (11.5 MiB) TX bytes:12093653 (11.5 MiB) [root@rac1 bin]# ./crs_stat -t -v Name Type R/RA F/FT Target State Host ---------------------------------------------------------------------- ora....SM1.asm application 0/5 0/0 ONLINE ONLINE rac1 ora....C1.lsnr application 0/5 0/0 ONLINE ONLINE rac1 ora.rac1.gsd application 0/5 0/0 ONLINE ONLINE rac1 ora.rac1.ons application 0/3 0/0 ONLINE ONLINE rac1 ora.rac1.vip application 0/0 0/0 ONLINE ONLINE rac1 ora....SM2.asm application 0/5 0/0 ONLINE ONLINE rac2 ora....C2.lsnr application 0/5 0/0 ONLINE ONLINE rac2 ora.rac2.gsd application 0/5 0/0 ONLINE ONLINE rac2 ora.rac2.ons application 0/3 0/0 ONLINE ONLINE rac2 ora.rac2.vip application 0/0 0/0 ONLINE ONLINE rac2 ora.roger.db application 0/0 0/1 ONLINE ONLINE rac1 ora....lldb.cs application 0/0 0/1 ONLINE ONLINE rac1 ora....er1.srv application 0/0 0/0 ONLINE ONLINE rac1 ora....er2.srv application 0/0 0/0 ONLINE ONLINE rac2 ora....r1.inst application 0/5 0/0 ONLINE ONLINE rac1 ora....r2.inst application 0/5 0/0 ONLINE ONLINE rac2 [root@rac1 bin]# [root@rac1 bin]# ./crs_stat -t Name Type Target State Host ------------------------------------------------------------ ora....SM1.asm application ONLINE ONLINE rac1 ora....C1.lsnr application ONLINE ONLINE rac1 ora.rac1.gsd application ONLINE ONLINE rac1 ora.rac1.ons application ONLINE ONLINE rac1 ora.rac1.vip application ONLINE ONLINE rac1 ora....SM2.asm application ONLINE OFFLINE ora....C2.lsnr application ONLINE OFFLINE ora.rac2.gsd application ONLINE OFFLINE ora.rac2.ons application ONLINE OFFLINE ora.rac2.vip application ONLINE ONLINE rac1 ora.roger.db application ONLINE ONLINE rac1 ora....lldb.cs application ONLINE ONLINE rac1 ora....er1.srv application ONLINE ONLINE rac1 ora....er2.srv application ONLINE OFFLINE ora....r1.inst application ONLINE ONLINE rac1 ora....r2.inst application ONLINE OFFLINE |
不一会儿,rac2 reboot重启了,经查是心跳出问题了。最后检查发现是测试的方式有问题。
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 |
[root@rac1 bin]# ./oifcfg iflist eth0 192.168.0.0 bond0 192.168.73.0 [root@rac1 bin]# [root@rac1 bin]# ping 192.168.73.11 PING 192.168.73.11 (192.168.73.11) 56(84) bytes of data. From 192.168.73.10 icmp_seq=2 Destination Host Unreachable From 192.168.73.10 icmp_seq=3 Destination Host Unreachable From 192.168.73.10 icmp_seq=4 Destination Host Unreachable --- 192.168.73.11 ping statistics --- 4 packets transmitted, 0 received, +3 errors, 100% packet loss, time 3663ms , pipe 3 [root@rac1 bin]# ifconfig eth1 up 当把rac1上的eth1 激活后,rac2 心跳能ping通了。 经过多次测试发现,不管是mode=0 还是1, 当我在rac1上执行ifconfig eth1 down后,最后都会导致rac2节点reboot。ocssd.log会出现如下类似信息: [ CSSD]2013-02-01 01:02:13.065 [3063929744] >WARNING: clssnmPollingThread: node rac1 (1) at 50% heartbeat fatal, eviction in 29.930 seconds seedhbimpd 0 [ CSSD]2013-02-01 01:02:13.065 [3063929744] >TRACE: clssnmPollingThread: node rac1 (1) is impending reconfig, flag 1039, misstime 30070 [ CSSD]2013-02-01 01:02:13.065 [3063929744] >TRACE: clssnmPollingThread: diskTimeout set to (57000)ms impending reconfig status(1) [ CSSD]2013-02-01 01:02:13.219 [3053439888] >TRACE: clssnmSendingThread: sending status msg to all nodes [ CSSD]2013-02-01 01:02:13.219 [3053439888] >TRACE: clssnmSendingThread: sent 5 status msgs to all nodes [ CSSD]2013-02-01 01:02:14.343 [3063929744] >WARNING: clssnmPollingThread: node rac1 (1) at 50% heartbeat fatal, eviction in 28.920 seconds seedhbimpd 1 [ CSSD]2013-02-01 01:02:18.257 [3053439888] >TRACE: clssnmSendingThread: sending status msg to all nodes [ CSSD]2013-02-01 01:02:18.257 [3053439888] >TRACE: clssnmSendingThread: sent 4 status msgs to all nodes [ CSSD]2013-02-01 01:02:23.434 [3053439888] >TRACE: clssnmSendingThread: sending status msg to all nodes [ CSSD]2013-02-01 01:02:23.435 [3053439888] >TRACE: clssnmSendingThread: sent 4 status msgs to all nodes [ CSSD]2013-02-01 01:02:28.454 [3053439888] >TRACE: clssnmSendingThread: sending status msg to all nodes [ CSSD]2013-02-01 01:02:28.454 [3053439888] >TRACE: clssnmSendingThread: sent 4 status msgs to all nodes [ CSSD]2013-02-01 01:02:32.181 [3063929744] >WARNING: clssnmPollingThread: node rac1 (1) at 75% heartbeat fatal, eviction in 14.900 seconds seedhbimpd 1 [ CSSD]2013-02-01 01:02:33.413 [3063929744] >WARNING: clssnmPollingThread: node rac1 (1) at 75% heartbeat fatal, eviction in 13.900 seconds seedhbimpd 1 |
最后参考mos官方文档 Configure Ethernet Bonding Interface on EL5 or RHEL5 [ID 877012.1],进行如下配置修改,这样是
oracle mos文档推荐的配置方式,是linux 5/linux 5+版本的推荐设置方式:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 |
1.configure bonding driver # grep bond0 /etc/modprobe.conf alias bond0 bonding 2.configure under-layer interfaces # cat /etc/sysconfig/network-scripts/ifcfg-eth1 DEVICE=eth1 BOOTPROTO=none ONBOOT=yes MASTER=bond0 SLAVE=yes # cat /etc/sysconfig/network-scripts/ifcfg-eth2 DEVICE=eth2 BOOTPROTO=none ONBOOT=yes MASTER=bond0 SLAVE=yes 3.configure bonding interface with bonding parameters # cat /etc/sysconfig/network-scripts/ifcfg-bond0 DEVICE=bond0 MASTER=yes BOOTPROTO=dhcp ONBOOT=yes BONDING_OPTS="mode=4 miimon=100 lacp_rate=1" 这第3步骤我感觉不对,应该改成静态ip,于是修改为如下: rac1: DEVICE=bond0 MASTER=yes #BOOTPROTO=dhcp BOOTPROTO=static IPADDR=192.168.73.10 NETWORK=192.168.73.0 BROADCAST=192.168.73.255 NETMASK=255.255.255.0 ONBOOT=yes BONDING_OPTS="mode=4 miimon=100 lacp_rate=1" rac2: DEVICE=bond0 MASTER=yes #BOOTPROTO=dhcp BOOTPROTO=static IPADDR=192.168.73.11 NETWORK=192.168.73.0 BROADCAST=192.168.73.255 NETMASK=255.255.255.0 ONBOOT=yes BONDING_OPTS="mode=4 miimon=100 lacp_rate=1" 4.activate bonding interface # ifup bond0 |
参考mos这个文档修改以后,再次测试,发现rac2节点仍然会被驱逐进而reboot,如下:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 |
[root@rac1 bin]# ./crs_stat -t -v Name Type R/RA F/FT Target State Host ---------------------------------------------------------------------- ora....SM1.asm application 0/5 0/0 ONLINE ONLINE rac1 ora....C1.lsnr application 0/5 0/0 ONLINE ONLINE rac1 ora.rac1.gsd application 0/5 0/0 ONLINE ONLINE rac1 ora.rac1.ons application 0/3 0/0 ONLINE ONLINE rac1 ora.rac1.vip application 0/0 0/0 ONLINE ONLINE rac1 ora....SM2.asm application 0/5 0/0 ONLINE ONLINE rac2 ora....C2.lsnr application 0/5 0/0 ONLINE ONLINE rac2 ora.rac2.gsd application 0/5 0/0 ONLINE ONLINE rac2 ora.rac2.ons application 0/3 0/0 ONLINE ONLINE rac2 ora.rac2.vip application 0/0 0/0 ONLINE ONLINE rac2 ora.roger.db application 0/0 0/1 ONLINE ONLINE rac2 ora....lldb.cs application 0/0 0/1 ONLINE ONLINE rac1 ora....er1.srv application 0/0 0/0 ONLINE ONLINE rac1 ora....er2.srv application 0/0 0/0 ONLINE ONLINE rac2 ora....r1.inst application 0/5 0/0 ONLINE ONLINE rac1 ora....r2.inst application 0/5 0/0 ONLINE ONLINE rac2 [root@rac1 bin]# [root@rac1 bin]# ifconfig eth1 down rac2的ocssd.log: [ CSSD]2013-02-01 01:31:53.698 [3032460176] >TRACE: clssnmSendingThread: sent 4 status msgs to all nodes [ CSSD]2013-02-01 01:31:54.897 [3042950032] >WARNING: clssnmPollingThread: node rac1 (1) at 50% heartbeat fatal, eviction in 29.650 seconds seedhbimpd 0 [ CSSD]2013-02-01 01:31:54.897 [3042950032] >TRACE: clssnmPollingThread: node rac1 (1) is impending reconfig, flag 1039, misstime 30350 [ CSSD]2013-02-01 01:31:54.897 [3042950032] >TRACE: clssnmPollingThread: diskTimeout set to (57000)ms impending reconfig status(1) [ CSSD]2013-02-01 01:31:56.137 [3042950032] >WARNING: clssnmPollingThread: node rac1 (1) at 50% heartbeat fatal, eviction in 28.650 seconds seedhbimpd 1 [ CSSD]2013-02-01 01:31:58.679 [3032460176] >TRACE: clssnmSendingThread: sending status msg to all nodes [ CSSD]2013-02-01 01:31:58.679 [3032460176] >TRACE: clssnmSendingThread: sent 4 status msgs to all nodes [ CSSD]2013-02-01 01:32:03.664 [3032460176] >TRACE: clssnmSendingThread: sending status msg to all nodes [ CSSD]2013-02-01 01:32:03.664 [3032460176] >TRACE: clssnmSendingThread: sent 4 status msgs to all nodes [ CSSD]2013-02-01 01:32:08.640 [3032460176] >TRACE: clssnmSendingThread: sending status msg to all nodes [ CSSD]2013-02-01 01:32:08.640 [3032460176] >TRACE: clssnmSendingThread: sent 4 status msgs to all nodes [ CSSD]2013-02-01 01:32:13.682 [3042950032] >WARNING: clssnmPollingThread: node rac1 (1) at 75% heartbeat fatal, eviction in 14.620 seconds seedhbimpd 1 [ CSSD]2013-02-01 01:32:14.941 [3042950032] >WARNING: clssnmPollingThread: node rac1 (1) at 75% heartbeat fatal, eviction in 13.620 seconds seedhbimpd 1 [ CSSD]2013-02-01 01:32:14.975 [3032460176] >TRACE: clssnmSendingThread: sending status msg to all nodes [ CSSD]2013-02-01 01:32:14.975 [3032460176] >TRACE: clssnmSendingThread: sent 5 status msgs to all nodes [ CSSD]2013-02-01 01:32:21.230 [3032460176] >TRACE: clssnmSendingThread: sending status msg to all nodes [ CSSD]2013-02-01 01:32:21.230 [3032460176] >TRACE: clssnmSendingThread: sent 5 status msgs to all nodes [ CSSD]2013-02-01 01:32:21.350 [145910672] >TRACE: clssgmAllocateRPCIndex: allocated rpc 262 (0x19ddd0) [ CSSD]2013-02-01 01:32:21.350 [145910672] >TRACE: clssgmRPC: rpc 0x19ddd0 (RPC#262) tag(106002a) sent to node 1 [ CSSD]2013-02-01 01:32:25.001 [3042950032] >WARNING: clssnmPollingThread: node rac1 (1) at 90% heartbeat fatal, eviction in 5.610 seconds seedhbimpd 1 [ CSSD]2013-02-01 01:32:26.253 [3042950032] >WARNING: clssnmPollingThread: node rac1 (1) at 90% heartbeat fatal, eviction in 4.610 seconds seedhbimpd 1 [ CSSD]2013-02-01 01:32:27.496 [3042950032] >WARNING: clssnmPollingThread: node rac1 (1) at 90% heartbeat fatal, eviction in 3.600 seconds seedhbimpd 1 [ CSSD]2013-02-01 01:32:27.532 [3032460176] >TRACE: clssnmSendingThread: sending status msg to all nodes [ CSSD]2013-02-01 01:32:27.532 [3032460176] >TRACE: clssnmSendingThread: sent 5 status msgs to all nodes [ CSSD]2013-02-01 01:32:28.783 [3042950032] >WARNING: clssnmPollingThread: node rac1 (1) at 90% heartbeat fatal, eviction in 2.600 seconds seedhbimpd 1 [ CSSD]2013-02-01 01:32:30.043 [3042950032] >WARNING: clssnmPollingThread: node rac1 (1) at 90% heartbeat fatal, eviction in 1.600 seconds seedhbimpd 1 [ CSSD]2013-02-01 01:32:31.327 [3042950032] >WARNING: clssnmPollingThread: node rac1 (1) at 90% heartbeat fatal, eviction in 0.600 seconds seedhbimpd 1 [ CSSD]------- Begin Dump ------- |
从目前测试的linux 网卡bond来看,似乎不靠谱,经查是我这里测试的方式不太对,不能通过ifconfig eth1 down的方式。
补充:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 |
bonding mode=1 miimon=100。miimon是用来进行链路监测的。 比如:miimon=100,那么系统每100ms监测一次链路连接状态,如果有一条线路不通就转入另一条线路; mode的值表示工作模式,他共有0-6七种模式,常用的为0,1,6三种。 mode=0:平衡负载模式,有自动备援,但需要”Switch”支援及设定。 mode=1:自动备援模式,其中一条线若断线,其他线路将会自动备援。 mode=6:平衡负载模式,有自动备援,不需要”Switch”支援及设定。 mode=0 (balance-rr) Round-robin policy: Transmit packets in sequential order from the first available slave through the last. This mode provides load balancing and fault tolerance. mode=1 (active-backup) Active-backup policy: Only one slave in the bond is active. A different slave becomes active if, and only if, the active slave fails. The bond’s MAC address is externally visible on only one port (network adapter) to avoid confusing the switch. This mode provides fault tolerance. The primary option affects the behavior of this mode. mode=2 (balance-xor) XOR policy: Transmit based on [(source MAC address XOR'd with destination MAC address) modulo slave count]. This selects the same slave for each destination MAC address. This mode provides load balancing and fault tolerance. mode=3 (broadcast) Broadcast policy: transmits everything on all slave interfaces. This mode provides fault tolerance. mode=4 (802.3ad) IEEE 802.3ad Dynamic link aggregation. Creates aggregation groups that share the same speed and duplex settings. Utilizes all slaves in the active aggregator according to the 802.3ad specification. Pre-requisites: 1. Ethtool support in the base drivers for retrieving the speed and duplex of each slave. 2. A switch that supports IEEE 802.3ad Dynamic link aggregation. Most switches will require some type of configuration to enable 802.3ad mode. mode=5 (balance-tlb) Adaptive transmit load balancing: channel bonding that does not require any special switch support. The outgoing traffic is distributed according to the current load (computed relative to the speed) on each slave. Incoming traffic is received by the current slave. If the receiving slave fails, another slave takes over the MAC address of the failed receiving slave. Prerequisite: Ethtool support in the base drivers for retrieving the speed of each slave. mode=6 (balance-alb) Adaptive load balancing: includes balance-tlb plus receive load balancing (rlb) for IPV4 traffic, and does not require any special switch support. The receive load balancing is achieved by ARP negotiation. The bonding driver intercepts the ARP Replies sent by the local system on their way out and overwrites the source hardware address with the unique hardware address of one of the slaves in the bond such that different peers use different hardware addresses for the server. |
总结:
1. oracle rac环境,心跳冗余建议用mode=1,不建议使用0或6以及其他模式;例如使用mode=6可能导致vip飘逸的情况出现。
2. 测试网卡绑定效果,不能使用ifconfig down的方式,只能通过插拔网线来实现。应该ifconfig down操作以后,
该网卡信息会被从/etc/sysconfig/network-scripts/ifcfg-bond0 中清除掉。
进而导致crs 节点被驱逐。
3. 其他平台如aix 可以使用ether channel,hpux可以使用APA 进行绑定。
4. 从11.2.0.2开始,支持HAIP,当然,仍然是支持os级别的bond等技术。
One Response to “oracle 10gR2 rac for Linux–心跳网卡冗余配置和测试”
强人,佩服!!
Leave a Reply
You must be logged in to post a comment.