10g rac如何通过votedisk来判断disk心跳?
本站文章除注明转载外,均为本站原创: 转载自love wife love life —Roger的Oracle/MySQL/PostgreSQL数据恢复博客
本文链接地址: 10g rac如何通过votedisk来判断disk心跳?
今天测试破坏votedisk 的情况,分析日志发现了kill block,以前也知道这个知识点,感觉不是很清楚,今天进行跟踪分析一下。
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 |
[root@rac1 bin]# ./crsctl stop crs Stopping resources. This could take several minutes. Successfully stopped CRS resources. Stopping CSSD. Shutting down CSS daemon. Shutdown request successfully issued. [root@rac1 bin]# [root@rac1 bin]# ./crsctl query css votedisk 0. 0 /dev/raw/raw2 located 1 votedisk(s). [root@rac1 bin]# [root@rac1 bin]# dd if=/dev/zero of=/dev/raw/raw2 bs=8192 count=10000 10000+0 records in 10000+0 records out 81920000 bytes (82 MB) copied, 8.41079 seconds, 9.7 MB/s [root@rac1 bin]# [root@rac1 bin]# ./crsctl start crs Attempting to start CRS stack The CRS stack will be started shortly [root@rac1 bin]# ps -ef|grep d.bin root 1684 17897 0 08:11 pts/1 00:00:00 grep d.bin oracle 28977 28973 0 08:01 ? 00:00:00 /home/oracle/app/oracle/product/10.2.0/crs/bin/evmd.bin root 29062 27855 0 08:01 ? 00:00:02 /home/oracle/app/oracle/product/10.2.0/crs/bin/crsd.bin reboot root 29463 29109 0 08:01 ? 00:00:00 /home/oracle/app/oracle/product/10.2.0/crs/bin/oprocd.bin run -t 1000 -m 500 -f oracle 29558 29153 0 08:01 ? 00:00:01 /home/oracle/app/oracle/product/10.2.0/crs/bin/ocssd.bin [root@rac1 bin]# ./crs_stat -t Name Type Target State Host ------------------------------------------------------------ ora....SM1.asm application ONLINE ONLINE rac1 ora....C1.lsnr application ONLINE ONLINE rac1 ora.rac1.gsd application ONLINE ONLINE rac1 ora.rac1.ons application ONLINE ONLINE rac1 ora.rac1.vip application ONLINE ONLINE rac1 ora....SM2.asm application ONLINE ONLINE rac2 ora....C2.lsnr application ONLINE ONLINE rac2 ora.rac2.gsd application ONLINE ONLINE rac2 ora.rac2.ons application ONLINE ONLINE rac2 ora.rac2.vip application ONLINE ONLINE rac2 ora.roger.db application ONLINE ONLINE rac2 ora....lldb.cs application ONLINE ONLINE rac1 ora....er1.srv application ONLINE ONLINE rac1 ora....er2.srv application ONLINE ONLINE rac2 ora....r1.inst application ONLINE ONLINE rac1 ora....r2.inst application ONLINE ONLINE rac2 |
ocssd.log如下:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 |
[ CSSD]2012-11-14 08:01:17.463 [3503984] >TRACE: clssscmain: RT queue setting is at default value [ CSSD]2012-11-14 08:01:17.489 [3503984] >TRACE: clssscmain: local-only set to false [ CSSD]2012-11-14 08:01:17.534 [3503984] >TRACE: clssnmReadNodeInfo: added node 1 (rac1) to cluster [ CSSD]2012-11-14 08:01:17.568 [3503984] >TRACE: clssnmReadNodeInfo: added node 2 (rac2) to cluster [ CSSD]2012-11-14 08:01:17.573 [3503984] >TRACE: clssnmInitNMInfo: Initialized with unique 1352908877 [ CSSD]2012-11-14 08:01:17.590 [3503984] >TRACE: clssNMInitialize: Initializing with OCR id (1450286276) [ CSSD]2012-11-14 08:01:18.902 [41585552] >TRACE: clssnm_skgxninit: Compatible vendor clusterware not in use [ CSSD]2012-11-14 08:01:18.921 [3503984] >TRACE: clssnmNMInitialize: misscount set to (60) [ CSSD]2012-11-14 08:01:18.926 [3503984] >TRACE: clssnmStartNM: reboottime set to (3) sec [ CSSD]2012-11-14 08:01:18.926 [3503984] >TRACE: clssnmNMInitialize: Network heartbeat thresholds are: impending reconfig 30000 ms, reconfig start (misscount) 60000 ms [ CSSD]2012-11-14 08:01:18.949 [3503984] >TRACE: clssnmDiskStateChange: state from 1 to 2 disk (0//dev/raw/raw2) [ CSSD]2012-11-14 08:01:18.949 [41585552] >TRACE: clssnmvDPT: spawned for disk 0 (/dev/raw/raw2) [ CSSD]2012-11-14 08:01:19.049 [41585552] >TRACE: clssnmvDiskOpen: Overwrote kill block for voting disk /dev/raw/raw2 [ CSSD]2012-11-14 08:01:21.657 [41585552] >TRACE: clssnmDiskStateChange: state from 2 to 4 disk (0//dev/raw/raw2) [ CSSD]2012-11-14 08:01:21.790 [3503984] >TRACE: clssnmFatalInit: fatal mode enabled [ CSSD]2012-11-14 08:01:21.884 [102837136] >TRACE: clssnmClusterListener: Spawned [ CSSD]2012-11-14 08:01:21.889 [102837136] >TRACE: clssnmClusterListener: Listening on (ADDRESS=(PROTOCOL=tcp)(HOST=rac1-priv)(PORT=49895)) |
从上面可以看到Overwrote kill block for voting disk的关键字。
下面通过strace来跟踪ocssd进程,由于occsd是每秒都会去检测,故跟踪几秒就行了,然后截取如下片段:
|
29627 0.000017 pread64(11, <unfinished ...> ----开始点 29574 0.000176 futex(0x888ef28, FUTEX_WAKE_PRIVATE, 1) = 0 29574 0.000032 futex(0x888efb4, FUTEX_WAIT_PRIVATE, 5869, NULL <unfinished ...> 29627 0.000631 <... pread64 resumed> "SslcLlik\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 512, 270848) = 512 29627 0.000046 times(NULL) = 431789335 29627 0.000024 times(NULL) = 431789335 29627 0.000023 nanosleep({1, 0}, <unfinished ...> 29640 0.233418 <... futex resumed> ) = -1 ETIMEDOUT (Connection timed out) 29640 0.000061 times(NULL) = 431789354 29640 0.000030 futex(0x88bcfa8, FUTEX_WAKE_PRIVATE, 1) = 0 29640 0.000031 gettimeofday({1352909585, 135945}, NULL) = 0 29640 0.000032 times(NULL) = 431789354 29640 0.000030 times(NULL) = 431789354 29640 0.000025 gettimeofday({1352909585, 136032}, NULL) = 0 29640 0.000026 times(NULL) = 431789354 29640 0.000021 times(NULL) = 431789354 29640 0.000021 times(NULL) = 431789354 29640 0.000051 sendto(14, "P", 1, 0, {sa_family=AF_INET, sin_port=htons(42745), sin_addr=inet_addr("127.0.0.1")}, 16) = 1 29632 0.000069 <... poll resumed> ) = 1 ([{fd=14, revents=POLLIN|POLLRDNORM}]) 29640 0.000032 times( <unfinished ...> 29632 0.000010 read(14, <unfinished ...> 29640 0.000016 <... times resumed> NULL) = 431789354 29632 0.000010 <... read resumed> "P", 2000) = 1 29640 0.000019 gettimeofday( <unfinished ...> 29632 0.000009 times( <unfinished ...> 29640 0.000010 <... gettimeofday resumed> {1352909585, 136306}, NULL) = 0 29632 0.000013 <... times resumed> NULL) = 431789354 29640 0.000014 clock_gettime(CLOCK_REALTIME, <unfinished ...> 29632 0.000010 times( <unfinished ...> 29640 0.000011 <... clock_gettime resumed> {1352909585, 136354273}) = 0 29632 0.000012 <... times resumed> NULL) = 431789354 29640 0.000016 futex(0x89b98d4, FUTEX_WAIT_PRIVATE, 2217, {0, 1951727} <unfinished ...> 29632 0.000016 futex(0x89b98d4, FUTEX_CMP_REQUEUE_PRIVATE, 1, 2147483647, 0x89b97c0, 2218 <unfinished ...> 29640 0.000012 <... futex resumed> ) = -1 EAGAIN (Resource temporarily unavailable) 29632 0.000014 <... futex resumed> ) = 0 29640 0.000016 futex(0x89b97c0, FUTEX_WAIT_PRIVATE, 2, NULL <unfinished ...> 29632 0.000012 futex(0x89b97c0, FUTEX_WAKE_PRIVATE, 1 <unfinished ...> 29640 0.000012 <... futex resumed> ) = -1 EAGAIN (Resource temporarily unavailable) 29632 0.000011 <... futex resumed> ) = 0 29640 0.000014 futex(0x89b98d0, FUTEX_WAIT_PRIVATE, 2, NULL <unfinished ...> 29632 0.000012 futex(0x89b98d0, FUTEX_WAKE_PRIVATE, 1 <unfinished ...> 29640 0.000026 <... futex resumed> ) = -1 EAGAIN (Resource temporarily unavailable) 29632 0.000011 <... futex resumed> ) = 0 29640 0.000014 futex(0x89b98d0, FUTEX_WAKE_PRIVATE, 1 <unfinished ...> 29632 0.000012 futex(0x89b98d4, FUTEX_WAIT_PRIVATE, 2219, NULL <unfinished ...> 29640 0.000012 <... futex resumed> ) = 0 29632 0.000009 <... futex resumed> ) = -1 EAGAIN (Resource temporarily unavailable) 29640 0.000016 futex(0x89b98d4, FUTEX_CMP_REQUEUE_PRIVATE, 1, 2147483647, 0x89b97c0, 2220 <unfinished ...> 29632 0.000037 futex(0x89b97c0, FUTEX_WAIT_PRIVATE, 2, NULL <unfinished ...> 29640 0.000017 <... futex resumed> ) = 0 29640 0.000013 times(NULL) = 431789354 29640 0.000025 futex(0x89b97c0, FUTEX_WAKE_PRIVATE, 1) = 1 29632 0.000027 <... futex resumed> ) = 0 29640 0.000014 write(16, "t\0\0\0\3\0\0\0\n\2\1\4,\2\0\0\0\0\0\0\1\0\0\0\350jU\1\0\0\0\0"..., 116 <unfinished ...> 29632 0.000095 futex(0x89b97c0, FUTEX_WAKE_PRIVATE, 1 <unfinished ...> 29640 0.000015 <... write resumed> ) = 116 29632 0.000016 <... futex resumed> ) = 0 29640 0.000015 times( <unfinished ...> 29632 0.000009 gettimeofday( <unfinished ...> 29640 0.000011 <... times resumed> NULL) = 431789354 29632 0.000012 <... gettimeofday resumed> {1352909585, 136904}, NULL) = 0 29640 0.000019 times( <unfinished ...> 29632 0.000009 times( <unfinished ...> 29640 0.000011 <... times resumed> NULL) = 431789354 29632 0.000015 <... times resumed> NULL) = 431789354 29640 0.000016 gettimeofday( <unfinished ...> 29632 0.000010 poll([{fd=14, events=POLLIN|POLLRDNORM}, {fd=15, events=POLLIN|POLLRDNORM}, {fd=16, events=POLLIN|POLLRDNORM}], 3, -1 <unfinished ...> 29640 0.000023 <... gettimeofday resumed> {1352909585, 136997}, NULL) = 0 29640 0.000017 times(NULL) = 431789354 29640 0.000021 times(NULL) = 431789354 29640 0.000021 times(NULL) = 431789354 29640 0.000030 sendto(17, "P", 1, 0, {sa_family=AF_INET, sin_port=htons(64048), sin_addr=inet_addr("127.0.0.1")}, 16) = 1 29633 0.000127 <... poll resumed> ) = 1 ([{fd=17, revents=POLLIN|POLLRDNORM}]) 29640 0.000067 times( <unfinished ...> 29633 0.000010 read(17, <unfinished ...> 29640 0.000013 <... times resumed> NULL) = 431789354 29633 0.000010 <... read resumed> "P", 2000) = 1 29640 0.000026 gettimeofday( <unfinished ...> 29633 0.000009 times( <unfinished ...> 29640 0.000011 <... gettimeofday resumed> {1352909585, 137372}, NULL) = 0 29633 0.000013 <... times resumed> NULL) = 431789354 29640 0.000013 clock_gettime(CLOCK_REALTIME, <unfinished ...> 29633 0.000009 times( <unfinished ...> 29640 0.000011 <... clock_gettime resumed> {1352909585, 137418619}) = 0 29633 0.000012 <... times resumed> NULL) = 431789354 29640 0.000016 futex(0x8a1597c, FUTEX_WAIT_PRIVATE, 3049, {0, 1953381} <unfinished ...> 29633 0.000015 futex(0x8a1597c, FUTEX_CMP_REQUEUE_PRIVATE, 1, 2147483647, 0x8a15868, 3050 <unfinished ...> 29640 0.000013 <... futex resumed> ) = -1 EAGAIN (Resource temporarily unavailable) 29633 0.000011 <... futex resumed> ) = 0 29640 0.000015 futex(0x8a15868, FUTEX_WAIT_PRIVATE, 2, NULL <unfinished ...> 29633 0.000012 futex(0x8a15868, FUTEX_WAKE_PRIVATE, 1 <unfinished ...> 29640 0.000011 <... futex resumed> ) = -1 EAGAIN (Resource temporarily unavailable) 29633 0.000011 <... futex resumed> ) = 0 29640 0.000015 futex(0x8a15978, FUTEX_WAIT_PRIVATE, 2, NULL <unfinished ...> 29633 0.000012 futex(0x8a15978, FUTEX_WAKE_PRIVATE, 1 <unfinished ...> 29640 0.000011 <... futex resumed> ) = -1 EAGAIN (Resource temporarily unavailable) 29633 0.000010 <... futex resumed> ) = 0 29640 0.000015 futex(0x8a15978, FUTEX_WAKE_PRIVATE, 1 <unfinished ...> 29633 0.000170 futex(0x8a1597c, FUTEX_WAIT_PRIVATE, 3051, NULL <unfinished ...> 29640 0.000241 <... futex resumed> ) = 0 29633 0.000140 <... futex resumed> ) = -1 EAGAIN (Resource temporarily unavailable) 29640 0.000096 futex(0x8a1597c, FUTEX_CMP_REQUEUE_PRIVATE, 1, 2147483647, 0x8a15868, 3052 <unfinished ...> 29633 0.000014 futex(0x8a15868, FUTEX_WAIT_PRIVATE, 2, NULL <unfinished ...> 29640 0.000012 <... futex resumed> ) = 0 29640 0.000012 times(NULL) = 431789354 29640 0.000025 futex(0x8a15868, FUTEX_WAKE_PRIVATE, 1) = 1 29633 0.000024 <... futex resumed> ) = 0 29640 0.000015 write(24, "H\0\0\0\26\0\0\0\n\2\1\1'\2\0\0\0\0\0\0\2\0\0\0\266oU\1\0\0\0\0"..., 72 <unfinished ...> 29633 0.000031 futex(0x8a15868, FUTEX_WAKE_PRIVATE, 1 <unfinished ...> 29640 0.000012 <... write resumed> ) = 72 29633 0.000016 <... futex resumed> ) = 0 29640 0.000015 gettimeofday( <unfinished ...> 29633 0.000009 gettimeofday( <unfinished ...> 29640 0.000011 <... gettimeofday resumed> {1352909585, 138439}, NULL) = 0 29633 0.000028 <... gettimeofday resumed> {1352909585, 138448}, NULL) = 0 29640 0.000023 futex(0x888efb4, FUTEX_CMP_REQUEUE_PRIVATE, 1, 2147483647, 0x888ef28, 5870 <unfinished ...> 29633 0.000012 times( <unfinished ...> 29640 0.000012 <... futex resumed> ) = 1 29633 0.000010 <... times resumed> NULL) = 431789354 29574 0.000011 <... futex resumed> ) = 0 29640 0.000014 times( <unfinished ...> 29633 0.000010 poll([{fd=17, events=POLLIN|POLLRDNORM}, {fd=18, events=POLLIN|POLLRDNORM}, {fd=19, events=POLLIN|POLLRDNORM}, {fd=20, events=POLLIN|POLLRDNORM}, {fd=24, events=POLLIN|POLLRDNORM}, {fd=25, events=POLLIN|POLLRDNORM}, {fd=26, events=POLLIN|POLLRDNORM}, {fd=27, events=POLLIN|POLLRDNORM}, {fd=28, events=POLLIN|POLLRDNORM}, {fd=29, events=POLLIN|POLLRDNORM}, {fd=30, events=POLLIN|POLLRDNORM}, {fd=31, events=POLLIN|POLLRDNORM}, {fd=32, events=POLLIN|POLLRDNORM}, {fd=33, events=POLLIN|POLLRDNORM}, {fd=34, events=POLLIN|POLLRDNORM}, {fd=35, events=POLLIN|POLLRDNORM}, {fd=36, events=POLLIN|POLLRDNORM}, {fd=37, events=POLLIN|POLLRDNORM}, {fd=38, events=POLLIN|POLLRDNORM}, {fd=39, events=POLLIN|POLLRDNORM}, {fd=40, events=POLLIN|POLLRDNORM}, {fd=41, events=POLLIN|POLLRDNORM}, {fd=42, events=POLLIN|POLLRDNORM}], 23, -1 <unfinished ...> 29574 0.000149 futex(0x888ef28, FUTEX_WAKE_PRIVATE, 1 <unfinished ...> 29640 0.000014 <... times resumed> NULL) = 431789354 29574 0.000011 <... futex resumed> ) = 0 29640 0.000013 gettimeofday( <unfinished ...> 29574 0.000012 futex(0x888efb4, FUTEX_WAIT_PRIVATE, 5871, NULL <unfinished ...> 29640 0.000011 <... gettimeofday resumed> {1352909585, 138767}, NULL) = 0 29640 0.000016 clock_gettime(CLOCK_REALTIME, {1352909585, 138806536}) = 0 29640 0.000026 futex(0x88bd034, FUTEX_WAIT_PRIVATE, 1139, {0, 999960464} <unfinished ...> 29632 0.184179 <... poll resumed> ) = 1 ([{fd=16, revents=POLLIN|POLLRDNORM}]) 29632 0.000059 times(NULL) = 431789369 29632 0.000028 times(NULL) = 431789369 29632 0.000068 read(16, "t\0\0\0\3\0\0\0\n\2\1\4*\2\0\0\0\0\0\0\1\0\0\0\222T\r\1\0\0\0\0"..., 131136) = 116 29632 0.000058 times(NULL) = 431789369 29632 0.000036 gettimeofday({1352909585, 323261}, NULL) = 0 29632 0.000039 futex(0x888efb4, FUTEX_CMP_REQUEUE_PRIVATE, 1, 2147483647, 0x888ef28, 5872) = 1 29574 0.000029 <... futex resumed> ) = 0 29632 0.000015 gettimeofday( <unfinished ...> 29574 0.000012 futex(0x888ef28, FUTEX_WAKE_PRIVATE, 1 <unfinished ...> 29632 0.000013 <... gettimeofday resumed> {1352909585, 323343}, NULL) = 0 29574 0.000014 <... futex resumed> ) = 0 29632 0.000013 times( <unfinished ...> 29574 0.000011 futex(0x888efb4, FUTEX_WAIT_PRIVATE, 5873, NULL <unfinished ...> 29632 0.000011 <... times resumed> NULL) = 431789369 29632 0.000015 poll([{fd=14, events=POLLIN|POLLRDNORM}, {fd=15, events=POLLIN|POLLRDNORM}, {fd=16, events=POLLIN|POLLRDNORM}], 3, -1 <unfinished ...> 29585 0.344349 <... futex resumed> ) = -1 ETIMEDOUT (Connection timed out) 29585 0.000043 times(NULL) = 431789397 29585 0.000025 times(NULL) = 431789397 29585 0.000039 gettimeofday({1352909585, 667888}, NULL) = 0 29585 0.000033 times(NULL) = 431789397 29585 0.000026 futex(0x89b7f70, FUTEX_WAKE_PRIVATE, 1) = 0 29585 0.000027 pwrite64(10, "etoV\1\0\0\0\1\3\n\2\0\0\0\0rac1\0\0\0\0\0\0\0\0\0\0\0\0"..., 512, 8704) = 512 29585 0.000666 times(NULL) = 431789397 29585 0.000182 gettimeofday({1352909585, 668838}, NULL) = 0 29585 0.000047 clock_gettime(CLOCK_REALTIME, {1352909585, 668870189}) = 0 29585 0.000027 futex(0x89b89ec, FUTEX_WAIT_PRIVATE, 1135, {0, 999967811} <unfinished ...> 29628 0.144986 <... futex resumed> ) = -1 ETIMEDOUT (Connection timed out) 29628 0.000040 times(NULL) = 431789409 29628 0.000025 times(NULL) = 431789409 29628 0.000027 gettimeofday({1352909585, 813974}, NULL) = 0 29628 0.000046 futex(0x888efb4, FUTEX_CMP_REQUEUE_PRIVATE, 1, 2147483647, 0x888ef28, 5874) = 1 29574 0.000027 <... futex resumed> ) = 0 29628 0.000014 times( <unfinished ...> 29574 0.000012 futex(0x888ef28, FUTEX_WAKE_PRIVATE, 1 <unfinished ...> 29628 0.000012 <... times resumed> NULL) = 431789409 29574 0.000010 <... futex resumed> ) = 0 29628 0.000014 times( <unfinished ...> 29574 0.000011 futex(0x888efb4, FUTEX_WAIT_PRIVATE, 5875, NULL <unfinished ...> 29628 0.000011 <... times resumed> NULL) = 431789409 29628 0.000013 times(NULL) = 431789409 29628 0.000036 times(NULL) = 431789409 29628 0.000026 times(NULL) = 431789409 29628 0.000029 times(NULL) = 431789409 29628 0.000023 times(NULL) = 431789409 29628 0.000021 times(NULL) = 431789409 29628 0.000022 times(NULL) = 431789409 29628 0.000022 times(NULL) = 431789409 29628 0.000021 times(NULL) = 431789409 29628 0.000021 times(NULL) = 431789409 29628 0.000022 times(NULL) = 431789409 29628 0.000021 times(NULL) = 431789409 29628 0.000021 times(NULL) = 431789409 29628 0.000022 times(NULL) = 431789409 29628 0.000021 times(NULL) = 431789409 29628 0.000021 times(NULL) = 431789409 29628 0.000022 times(NULL) = 431789409 29628 0.000021 times(NULL) = 431789409 29628 0.000021 times(NULL) = 431789409 29628 0.000022 times(NULL) = 431789409 29628 0.000021 times(NULL) = 431789409 29628 0.000021 times(NULL) = 431789409 29628 0.000022 times(NULL) = 431789409 29628 0.000021 times(NULL) = 431789409 29628 0.000052 times(NULL) = 431789409 29628 0.000022 times(NULL) = 431789409 29628 0.000021 times(NULL) = 431789409 29628 0.000022 times(NULL) = 431789409 29628 0.000022 gettimeofday({1352909585, 814803}, NULL) = 0 29628 0.000034 futex(0x888efb4, FUTEX_CMP_REQUEUE_PRIVATE, 1, 2147483647, 0x888ef28, 5876) = 1 29574 0.000022 <... futex resumed> ) = 0 29628 0.000014 times( <unfinished ...> 29574 0.000011 futex(0x888ef28, FUTEX_WAKE_PRIVATE, 1 <unfinished ...> 29628 0.000011 <... times resumed> NULL) = 431789409 29574 0.000010 <... futex resumed> ) = 0 29628 0.000013 gettimeofday( <unfinished ...> 29574 0.000011 futex(0x888efb4, FUTEX_WAIT_PRIVATE, 5877, NULL <unfinished ...> 29628 0.000011 <... gettimeofday resumed> {1352909585, 814917}, NULL) = 0 29628 0.000018 futex(0x89b7f70, FUTEX_WAKE_PRIVATE, 1) = 0 29628 0.000022 clock_gettime(CLOCK_REALTIME, {1352909585, 814981249}) = 0 29628 0.000028 futex(0x89b8964, FUTEX_WAIT_PRIVATE, 1181, {0, 999935751} <unfinished ...> 29627 0.088150 <... nanosleep resumed> NULL) = 0 29627 0.000070 times(NULL) = 431789416 29627 0.000026 nanosleep({0, 190000000}, NULL) = 0 29627 0.191678 times(NULL) = 431789431 29627 0.000026 nanosleep({0, 40000000}, <unfinished ...> 29639 0.035943 <... futex resumed> ) = -1 ETIMEDOUT (Connection timed out) 29639 0.000063 times(NULL) = 431789434 29639 0.000030 futex(0x88bce98, FUTEX_WAKE_PRIVATE, 1) = 0 29639 0.000025 times(NULL) = 431789434 29639 0.000027 gettimeofday({1352909586, 131046}, NULL) = 0 29639 0.000056 futex(0x888efb4, FUTEX_CMP_REQUEUE_PRIVATE, 1, 2147483647, 0x888ef28, 5878) = 1 29574 0.000027 <... futex resumed> ) = 0 29639 0.000015 times( <unfinished ...> 29574 0.000011 futex(0x888ef28, FUTEX_WAKE_PRIVATE, 1 <unfinished ...> 29639 0.000012 <... times resumed> NULL) = 431789434 29574 0.000011 <... futex resumed> ) = 0 29639 0.000013 gettimeofday( <unfinished ...> 29574 0.000012 futex(0x888efb4, FUTEX_WAIT_PRIVATE, 5879, NULL <unfinished ...> 29639 0.000030 <... gettimeofday resumed> {1352909586, 131191}, NULL) = 0 29639 0.000024 futex(0x888efb4, FUTEX_CMP_REQUEUE_PRIVATE, 1, 2147483647, 0x888ef28, 5880) = 1 29574 0.000022 <... futex resumed> ) = 0 29639 0.000013 times( <unfinished ...> 29574 0.000012 futex(0x888ef28, FUTEX_WAKE_PRIVATE, 1 <unfinished ...> 29639 0.000011 <... times resumed> NULL) = 431789434 29574 0.000011 <... futex resumed> ) = 0 29639 0.000012 gettimeofday( <unfinished ...> 29574 0.000012 futex(0x888efb4, FUTEX_WAIT_PRIVATE, 5881, NULL <unfinished ...> 29639 0.000011 <... gettimeofday resumed> {1352909586, 131338}, NULL) = 0 29639 0.000016 clock_gettime(CLOCK_REALTIME, {1352909586, 131378465}) = 0 29639 0.000028 futex(0x88bcf24, FUTEX_WAIT_PRIVATE, 1141, {0, 999959535} <unfinished ...> 29627 0.005665 <... nanosleep resumed> NULL) = 0 29627 0.000030 times(NULL) = 431789435 29627 0.000028 gettimeofday({1352909586, 137128}, NULL) = 0 29627 0.000041 futex(0x888efb4, FUTEX_CMP_REQUEUE_PRIVATE, 1, 2147483647, 0x888ef28, 5882) = 1 29574 0.000041 <... futex resumed> ) = 0 29627 0.000018 pread64(11, <unfinished ...> ---开始点 |
如上是1s内ossd进程对的操作,可以看到,分别操作了11,14,16,17,写了10和24.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 |
[root@rac1 fd]# ls -ltr total 0 lrwx------ 1 oracle oinstall 64 Nov 14 08:15 9 -> socket:[445844] lrwx------ 1 oracle oinstall 64 Nov 14 08:15 8 -> socket:[445843] l-wx------ 1 oracle oinstall 64 Nov 14 08:15 7 -> /home/oracle/app/oracle/product/10.2.0/crs/log/rac1/cssd/ocssd.log lr-x------ 1 oracle oinstall 64 Nov 14 08:15 6 -> /home/oracle/app/oracle/product/10.2.0/crs/css/mesg/clssus.msb lr-x------ 1 oracle oinstall 64 Nov 14 08:15 5 -> /dev/raw/raw3 lrwx------ 1 oracle oinstall 64 Nov 14 08:15 42 -> socket:[447945] lrwx------ 1 oracle oinstall 64 Nov 14 08:15 41 -> socket:[447909] lrwx------ 1 oracle oinstall 64 Nov 14 08:15 40 -> socket:[447891] lr-x------ 1 oracle oinstall 64 Nov 14 08:15 4 -> /home/oracle/app/oracle/product/10.2.0/crs/srvm/mesg/procus.msb lrwx------ 1 oracle oinstall 64 Nov 14 08:15 39 -> socket:[447810] lrwx------ 1 oracle oinstall 64 Nov 14 08:15 38 -> socket:[447742] lrwx------ 1 oracle oinstall 64 Nov 14 08:15 37 -> socket:[447720] lrwx------ 1 oracle oinstall 64 Nov 14 08:15 36 -> socket:[447537] lrwx------ 1 oracle oinstall 64 Nov 14 08:15 35 -> socket:[447214] lrwx------ 1 oracle oinstall 64 Nov 14 08:15 34 -> socket:[447277] lrwx------ 1 oracle oinstall 64 Nov 14 08:15 33 -> socket:[446829] lrwx------ 1 oracle oinstall 64 Nov 14 08:15 32 -> socket:[447126] lrwx------ 1 oracle oinstall 64 Nov 14 08:15 31 -> socket:[446777] lrwx------ 1 oracle oinstall 64 Nov 14 08:15 30 -> socket:[446757] lrwx------ 1 oracle oinstall 64 Nov 14 08:15 3 -> /home/oracle/app/oracle/product/10.2.0/crs/log/rac1/cssd/cssdOUT.log lrwx------ 1 oracle oinstall 64 Nov 14 08:15 29 -> socket:[446622] lrwx------ 1 oracle oinstall 64 Nov 14 08:15 28 -> socket:[447022] lrwx------ 1 oracle oinstall 64 Nov 14 08:15 27 -> socket:[449930] lrwx------ 1 oracle oinstall 64 Nov 14 08:15 26 -> socket:[446211] lrwx------ 1 oracle oinstall 64 Nov 14 08:15 25 -> socket:[446210] lrwx------ 1 oracle oinstall 64 Nov 14 08:15 24 -> socket:[446209] lrwx------ 1 oracle oinstall 64 Nov 14 08:15 23 -> socket:[446208] lrwx------ 1 oracle oinstall 64 Nov 14 08:15 22 -> socket:[446065] lrwx------ 1 oracle oinstall 64 Nov 14 08:15 21 -> socket:[446064] lrwx------ 1 oracle oinstall 64 Nov 14 08:15 20 -> socket:[446062] lrwx------ 1 oracle oinstall 64 Nov 14 08:15 2 -> /home/oracle/app/oracle/product/10.2.0/crs/log/rac1/cssd/cssdOUT.log lrwx------ 1 oracle oinstall 64 Nov 14 08:15 19 -> socket:[446059] lrwx------ 1 oracle oinstall 64 Nov 14 08:15 18 -> socket:[446055] lrwx------ 1 oracle oinstall 64 Nov 14 08:15 17 -> socket:[446051] lrwx------ 1 oracle oinstall 64 Nov 14 08:15 16 -> socket:[446206] lrwx------ 1 oracle oinstall 64 Nov 14 08:15 15 -> socket:[446049] lrwx------ 1 oracle oinstall 64 Nov 14 08:15 14 -> socket:[446046] lr-x------ 1 oracle oinstall 64 Nov 14 08:15 13 -> /home/oracle/app/oracle/product/10.2.0/crs/has/mesg/clsdus.msb l-wx------ 1 oracle oinstall 64 Nov 14 08:15 12 -> /home/oracle/app/oracle/product/10.2.0/crs/log/rac1/alertrac1.log lrwx------ 1 oracle oinstall 64 Nov 14 08:15 11 -> /dev/raw/raw2 lrwx------ 1 oracle oinstall 64 Nov 14 08:15 10 -> /dev/raw/raw2 lrwx------ 1 oracle oinstall 64 Nov 14 08:15 1 -> /home/oracle/app/oracle/product/10.2.0/crs/log/rac1/cssd/cssdOUT.log lr-x------ 1 oracle oinstall 64 Nov 14 08:15 0 -> /dev/null |
同时利用lsof来查看进程的 socket 信息:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 |
[root@rac1 fd]# lsof -p 29558 COMMAND PID USER FD TYPE DEVICE SIZE NODE NAME ocssd.bin 29558 oracle cwd DIR 8,5 4096 1826264 /home/oracle/app/oracle/product/10.2.0/crs/log/rac1/cssd ocssd.bin 29558 oracle rtd DIR 8,2 4096 2 / ocssd.bin 29558 oracle txt REG 8,5 1926884 1828988 /home/oracle/app/oracle/product/10.2.0/crs/bin/ocssd.bin ocssd.bin 29558 oracle mem REG 8,5 914864 1828259 /home/oracle/app/oracle/product/10.2.0/crs/lib/libocr10.so ocssd.bin 29558 oracle mem REG 8,5 761075 1828261 /home/oracle/app/oracle/product/10.2.0/crs/lib/libocrutl10.so ocssd.bin 29558 oracle mem REG 8,5 8009 1828595 /home/oracle/app/oracle/product/10.2.0/crs/lib/libskgxns.so ocssd.bin 29558 oracle mem REG 8,2 16428 650727 /lib/libdl-2.5.so ocssd.bin 29558 oracle mem REG 8,2 208352 650730 /lib/libm-2.5.so ocssd.bin 29558 oracle mem REG 8,2 1611564 650718 /lib/libc-2.5.so ocssd.bin 29558 oracle mem REG 8,5 1322560 1828260 /home/oracle/app/oracle/product/10.2.0/crs/lib/libocrb10.so ocssd.bin 29558 oracle mem REG 8,2 125736 650717 /lib/ld-2.5.so ocssd.bin 29558 oracle mem REG 8,5 2446253 1828263 /home/oracle/app/oracle/product/10.2.0/crs/lib/libhasgen10.so ocssd.bin 29558 oracle mem REG 8,2 129716 650721 /lib/libpthread-2.5.so ocssd.bin 29558 oracle mem REG 8,5 5659007 1828035 /home/oracle/app/oracle/product/10.2.0/crs/lib/libnnz10.so ocssd.bin 29558 oracle mem REG 8,2 101404 650725 /lib/libnsl-2.5.so ocssd.bin 29558 oracle mem REG 8,2 46680 639400 /lib/libnss_files-2.5.so ocssd.bin 29558 oracle mem REG 8,5 19203849 1761724 /home/oracle/app/oracle/product/10.2.0/crs/lib/libclntsh.so.10.1 ocssd.bin 29558 oracle 0r CHR 1,3 1737 /dev/null ocssd.bin 29558 oracle 1u REG 8,5 867 1826280 /home/oracle/app/oracle/product/10.2.0/crs/log/rac1/cssd/cssdOUT.log ocssd.bin 29558 oracle 2u REG 8,5 867 1826280 /home/oracle/app/oracle/product/10.2.0/crs/log/rac1/cssd/cssdOUT.log ocssd.bin 29558 oracle 3u REG 8,5 867 1826280 /home/oracle/app/oracle/product/10.2.0/crs/log/rac1/cssd/cssdOUT.log ocssd.bin 29558 oracle 4r REG 8,5 6144 1828246 /home/oracle/app/oracle/product/10.2.0/crs/srvm/mesg/procus.msb ocssd.bin 29558 oracle 5r CHR 162,3 13845 /dev/raw/raw3 ocssd.bin 29558 oracle 6r REG 8,5 4608 1828290 /home/oracle/app/oracle/product/10.2.0/crs/css/mesg/clssus.msb ocssd.bin 29558 oracle 7w REG 8,5 8072668 1826282 /home/oracle/app/oracle/product/10.2.0/crs/log/rac1/cssd/ocssd.log ocssd.bin 29558 oracle 8u IPv4 445843 UDP localhost.localdomain:38606 ocssd.bin 29558 oracle 9u unix 0xd76ce3c0 445844 /var/tmp/.oracle/srac1DBG_CSSD ocssd.bin 29558 oracle 10u CHR 162,2 13826 /dev/raw/raw2 ocssd.bin 29558 oracle 11u CHR 162,2 13826 /dev/raw/raw2 ocssd.bin 29558 oracle 12w REG 8,5 8756 1826273 /home/oracle/app/oracle/product/10.2.0/crs/log/rac1/alertrac1.log ocssd.bin 29558 oracle 13r REG 8,5 6144 1828946 /home/oracle/app/oracle/product/10.2.0/crs/has/mesg/clsdus.msb ocssd.bin 29558 oracle 14u IPv4 446046 UDP localhost.localdomain:42745 ocssd.bin 29558 oracle 15u IPv4 446049 TCP rac1-priv:49895 (LISTEN) ocssd.bin 29558 oracle 16u IPv4 446206 TCP rac1-priv:49895->rac2-priv:57981 (ESTABLISHED) ocssd.bin 29558 oracle 17u IPv4 446051 UDP localhost.localdomain:64048 ocssd.bin 29558 oracle 18u unix 0xe586b3c0 446055 /var/tmp/.oracle/sOracle_CSS_LclLstnr_crs_1 ocssd.bin 29558 oracle 19u unix 0xef02ec80 446059 /var/tmp/.oracle/sOCSSD_LL_rac1_ ocssd.bin 29558 oracle 20u unix 0xc2f22040 446062 /var/tmp/.oracle/sOCSSD_LL_rac1_crs ocssd.bin 29558 oracle 21u IPv4 446064 UDP localhost.localdomain:20320 ocssd.bin 29558 oracle 22u IPv4 446065 TCP rac1-priv:40146 (LISTEN) ocssd.bin 29558 oracle 23u IPv4 446208 TCP rac1-priv:39402->rac2-priv:32717 (ESTABLISHED) ocssd.bin 29558 oracle 24u unix 0xc2f22ac0 446209 /var/tmp/.oracle/sOCSSD_LL_rac1_ ocssd.bin 29558 oracle 25u unix 0xc2f22e40 446210 /var/tmp/.oracle/sOCSSD_LL_rac1_ ocssd.bin 29558 oracle 26u unix 0xeb7af580 446211 /var/tmp/.oracle/sOCSSD_LL_rac1_ ocssd.bin 29558 oracle 27u unix 0xef2ce900 449930 /var/tmp/.oracle/sOCSSD_LL_rac1_ ocssd.bin 29558 oracle 28u unix 0xca8c2ac0 447022 /var/tmp/.oracle/sOCSSD_LL_rac1_ ocssd.bin 29558 oracle 29u unix 0xc6f16c80 446622 /var/tmp/.oracle/sOCSSD_LL_rac1_ ocssd.bin 29558 oracle 30u unix 0xea937e40 446757 /var/tmp/.oracle/sOCSSD_LL_rac1_ ocssd.bin 29558 oracle 31u unix 0xc6f16900 446777 /var/tmp/.oracle/sOCSSD_LL_rac1_ ocssd.bin 29558 oracle 32u unix 0xc6f16740 447126 /var/tmp/.oracle/sOCSSD_LL_rac1_ ocssd.bin 29558 oracle 33u unix 0xe1320e40 446829 /var/tmp/.oracle/sOCSSD_LL_rac1_ ocssd.bin 29558 oracle 34u unix 0xd3d05e40 447277 /var/tmp/.oracle/sOCSSD_LL_rac1_ ocssd.bin 29558 oracle 35u unix 0xcd10de40 447214 /var/tmp/.oracle/sOCSSD_LL_rac1_ ocssd.bin 29558 oracle 36u unix 0xcd10d580 447537 /var/tmp/.oracle/sOCSSD_LL_rac1_ ocssd.bin 29558 oracle 37u unix 0xf6151e40 447720 /var/tmp/.oracle/sOCSSD_LL_rac1_ ocssd.bin 29558 oracle 38u unix 0xcd10d200 447742 /var/tmp/.oracle/sOCSSD_LL_rac1_ ocssd.bin 29558 oracle 39u unix 0xf6151580 447810 /var/tmp/.oracle/sOCSSD_LL_rac1_ ocssd.bin 29558 oracle 40u unix 0xf6151900 447891 /var/tmp/.oracle/sOCSSD_LL_rac1_ ocssd.bin 29558 oracle 41u unix 0xea646c80 447909 /var/tmp/.oracle/sOCSSD_LL_rac1_ ocssd.bin 29558 oracle 42u unix 0xea646200 447945 /var/tmp/.oracle/sOCSSD_LL_rac1_ |
如下是关于上面trace中提到的几个linux函数的解释:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 |
---函数 sendto #include <sys/types.h> #include <sys/socket.h> ssize_t send(int sock, const void *buf, size_t len, int flags); ssize_t sendto(int sock, const void *buf, size_t len, int flags, const struct sockaddr *to, socklen_t tolen); sock:索引将要从其发送数据的套接字。 buf:指向将要发送数据的缓冲区。 len:以上缓冲区的长度。 flags:是以下零个或者多个标志的组合体,可通过or操作连在一起 MSG_DONTROUTE:不要使用网关来发送封包,只发送到直接联网的主机。这个标志主要用于诊断或者路由程序。 MSG_DONTWAIT:操作不会被阻塞。 MSG_EOR:终止一个记录。 MSG_MORE:调用者有更多的数据需要发送。 MSG_NOSIGNAL:当另一端终止连接时,请求在基于流的错误套接字上不要发送SIGPIPE信号。 MSG_OOB:发送out-of-band数据(需要优先处理的数据),同时现行协议必须支持此种操作。 to:指向存放接收端地址的区域,可以为NULL。 tolen:以上内存区的长度,可以为0。 ---函数futex #include <linux/futex.h> #include <sys/time.h> int futex (int *uaddr, int op, int val, const struct timespec *timeout,int *uaddr2, int val3); #define __NR_futex 240 uaddr就是用户态下共享内存的地址,里面存放的是一个对齐的整型计数器。 op存放着操作类型。定义的有5中,这里我简单的介绍一下两种,剩下的感兴趣的自己去man futex FUTEX_WAIT: 原子性的检查uaddr中计数器的值是否为val,如果是则让进程休眠,直到FUTEX_WAKE或者超时(time-out)。也就是把进程挂到uaddr相对应的等待队列上去。 FUTEX_WAKE: 最多唤醒val个等待在uaddr上进程。 ---函数 gettimeofday #include<sys/time.h> int gettimeofday(struct timeval*tv,struct timezone *tz ) gettimeofday()会把目前的时间用tv 结构体返回,当地时区的信息则放到tz所指的结构中 函数执行成功后返回0,失败后返回-1, ---函数poll #include int poll(structpollfd *fds, nfds_t nfds, inttimeout); fds:是一个struct pollfd结构类型的数组,用于存放需要检测其状态的Socket描述符;每当调用这个函数之后,系统不会清空这个数组, 操作起来比较方便;特别是对于socket连接比较多的情况下,在一定程度上可以提高处理的效率;这一点与select()函数不同,调用 select()函数之后,select()函数会清空它所检测的socket描述符集合,导致每次调用select()之前都必须把socket描述符重新加入 到待检测的集合中;因此,select()函数适合于只检测一个socket描述符的情况,而poll()函数适合于大量socket描述符的情况; nfds:nfds_t类型的参数,用于标记数组fds中的结构体元素的总数量; timeout:是poll函数调用阻塞的时间,单位:毫秒; 返回值: >0:数组fds中准备好读、写或出错状态的那些socket描述符的总数量; =0:数组fds中没有任何socket描述符准备好读、写,或出错;此时poll超时,超时时间是timeout毫秒;换句话说,如果所检测的socket描述 符上没有任何事件发生的话,那么poll()函数会阻塞timeout所指定的毫秒时间长度之后返回,如果timeout==0,那么poll() 函数立即返 回而不阻塞,如果timeout==INFTIM,那么poll() 函数会一直阻塞下去,直到所检测的socket描述符上的感兴趣的事件发生是才返回,如 果感兴趣的事件永远不发生,那么poll()就会永远阻塞下去; -1: poll函数调用失败,同时会自动设置全局变量errno; 如果待检测的socket描述符为负值,则对这个描述符的检测就会被忽略,也就是不会对成员变量events进行检测,在events上注册的事件也会被忽略, poll()函数返回的时候,会把成员变量revents设置为0,表示没有事件发生; ---函数clock_gettime clock_gettime int clock_gettime(clockid_t clk_id, struct timespect *tp); struct timespec { time_t tv_sec; /* seconds */ long tv_nsec; /* nanoseconds */ }; clockid_t clk_id用于指定计时时钟的类型,对于我们Programmr以下三种比较常用: CLOCK_REALTIME, a system-wide realtime clock. CLOCK_PROCESS_CPUTIME_ID, high-resolution timer provided by the CPU for each process. CLOCK_THREAD_CPUTIME_ID, high-resolution timer provided by the CPU for each of the threads. CLOCK_REALTIME:系统实时时间,随系统实时时间改变而改变,即从UTC1970-1-1 0:0:0开始计时,中间时刻如果系统时间被用户该成其他,则对应的时间相应改变 CLOCK_MONOTONIC:从系统启动这一刻起开始计时,不受系统时间被用户改变的影响 CLOCK_PROCESS_CPUTIME_ID:本进程到当前代码系统CPU花费的时间 CLOCK_THREAD_CPUTIME_ID:本线程到当前代码系统CPU花费的时间 |
通过/proc/pid/fd 下面的信息,我们可要看到14,17正是occsd进程udp相关的操作,我们知道oracle clusterware的
心跳协议是基于UDP协议的。
从上面的跟踪,我们可以看出,每1s occsd每一秒都会对votedisk 进程读取,读的什么呢?其实就是被称为kill block,
这个block的大小是512 字节。 至于说为什么是512字节,而不是1024呢?这个不清楚,可能是基于os block而言或者说
是基于效率上考虑。
从如下的地方,我们可以看出是读的512 字节:
29627 0.000631 <... pread64 resumed> “SslcLlik\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0″…, 512, 270848) = 512
其中有写操作是针对fd 16,对于的id是446206,可以看到是:
ocssd.bin 29558 oracle 16u IPv4 446206 TCP rac1-priv:49895->rac2-priv:57981 (ESTABLISHED)
很显然这也是跟心跳有关系的。
最后还更新了fd 10,也就是我们的ocr,如下:
29585 0.000027 pwrite64(10, “etoV\1\0\0\0\1\3\n\2\0\0\0\0rac1\0\0\0\0\0\0\0\0\0\0\0\0″…, 512, 8704) = 512
可以看到写的大小也是512字节。估计也是更新一下node状态等信息而已。
我们知道,ocssd 进程通过读votedisk 读取kill block来判断该节点是否正常,这个行为被称为disk 心跳。
从trace可以看出occsd是没有写votedisk的(仅仅是读,获取一个时间戳 ),所以votedisk中内容是空的,这或许也是解释为什么votedisk不需要备份的一个原因。
2 Responses to “10g rac如何通过votedisk来判断disk心跳?”
很不错,值得一看。
V哥v5
Leave a Reply
You must be logged in to post a comment.