ASM 无法进行rebalance的奇怪案例
本站文章除注明转载外,均为本站原创: 转载自love wife love life —Roger的Oracle/MySQL/PostgreSQL数据恢复博客
本文链接地址: ASM 无法进行rebalance的奇怪案例
近期某客户一套环境出现异常,当进行alter diskgroup xxx modify power 0后;再次启动rebalance,发现无法启动rebalance,arb、rbal进程没有任何反应,现象大致如下:
1 2 3 4 5 6 7 8 |
SQL> select * from v$asm_operation; GROUP_NUMBER OPERA PASS STAT POWER ACTUAL SOFAR EST_WORK EST_RATE EST_MINUTES ERROR_CODE CON_ID ------------ ----- --------- ---- ---------- ---------- ---------- ---------- ---------- ----------- -------------------------------------------- ---------- 1 REBAL COMPACT WAIT 0 0 1 REBAL REBALANCE WAIT 0 0 1 REBAL REBUILD WAIT 0 0 1 REBAL RESYNC WAIT 0 0 |
当打开asm trace跟踪后,发现了一些蛛丝马迹:
alter system set events ‘15195 trace name context forever,level 7’;
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 |
kfdp_query: callcnt 1719757 grp 1 (DATAC1) NOTE: GroupBlock outside rolling migration privileged region ----- Abridged Call Stack Trace ----- ksedsts()+426<-kfnmGroupBlockGlobal()+659<-kfnmGroupBlockPriv()+318<-kfgFinalize()+334<-kfxdrvAlter()+3415<-kfxdrvEntry()+1417<-opiexe()+28735<-opiosq0()+4494<-kpooprx()+387<-kpoal8()+830<-opiodr()+1202<-ttcpip()+1222<-opitsk()+1903<-opiino()+936<-opiodr()+1202 <-opidrv()+1094<-sou2o()+165<-opimai_real()+422<-ssthrdmain()+417<-main()+256<-__libc_start_main()+245 ----- End of Abridged Call Stack Trace ----- Partial short call stack signature: 0xb0ac14de6c5e2e9c SQL> alter diskgroup DATAC1 rebalance power 6 kfgpCreate: max_fg_rel 4, max_disk_part 8 kfgpPartners: NOT appliance. kfgpPartners: max_fg_rel, max_disk_part(4, 8) has been adjusted to (3, 8) due to actual FG, disk configuration (3, 144, num_singledisk_fg 0) kfgpPartner: necessary rebalancing detected. Avail slot for disk120 7 target 8 WARNING: Too many uncompleted reconfigurations. Rebalance needs completion. kfgp (0x7fbf0ce71be8), allow quorum: 0, total disks: 148, FGs: total 3 active 3 normal 3 active quorum 0, max dsknum: 147, maxfgnum: 3 scores=480 ties=0 add=2 insert=0 replace=3 disk (0x7fbf0ce71440), num 0a slot 65535 fg 1 ptotal 10 pact 7 pnew 1 pdrp 2 pset dsk 0 [10]: a15fg3 d17fg3 d6fg2 a10fg2 a16fg3 a8fg2 a13fg3 a122fg2 a49fg2 n130fg3 disk (0x7fbf0ce709b0), num 1a slot 65535 fg 1 ptotal 10 pact 8 pnew 0 pdrp 2 pset dsk 1 [10]: d9fg2 a11fg2 d16fg3 a10fg2 a15fg3 a14fg3 a6fg2 a125fg3 a115fg2 a138fg3 disk (0x7fbf0ce70a18), num 2a slot 65535 fg 1 ptotal 8 pact 8 pnew 0 pdrp 0 pset dsk 2 [8]: a13fg3 a11fg2 a16fg3 a17fg3 a9fg2 a7fg2 a12fg3 a55fg2 disk (0x7fbf077108e0), num 3a slot 65535 fg 1 ptotal 11 pact 8 pnew 0 pdrp 3 pset dsk 3 [11]: d7fg2 a17fg3 a11fg2 a9fg2 a12fg3 d8fg2 a14fg3 a127fg3 a110fg2 d48fg2 a131fg3 disk (0x7fbf07710948), num 4a slot 65535 fg 1 ptotal 11 pact 8 pnew 0 pdrp 3 pset dsk 4 [11]: a14fg3 d10fg2 d12fg3 a6fg2 d13fg3 a7fg2 a15fg3 a114fg2 a50fg2 a34fg3 a140fg3 disk (0x7fbf077109b0), num 5a slot 65535 fg 1 ptotal 13 pact 8 pnew 0 pdrp 5 pset dsk 5 [13]: d12fg3 d7fg2 d13fg3 d8fg2 a14fg3 a9fg2 a15fg3 a58fg2 a115fg2 a35fg3 a93fg2 a135fg3 d48fg2 disk (0x7fbf0770f908), num 6a slot 65535 fg 2 ptotal 11 pact 8 pnew 0 pdrp 3 pset dsk 6 [11]: d14fg3 d17fg3 d0fg1 a4fg1 a13fg3 a15fg3 a1fg1 a36fg1 a32fg3 a69fg3 a85fg1 ...... ...... disk (0x7fbf07883478), num 85a slot 65535 fg 1 ptotal 15 pact 8 pnew 0 pdrp 7 pset dsk 85 [15]: d132fg3 d121fg2 a140fg3 d137fg3 d113fg2 a131fg3 a103fg2 d119fg2 a124fg3 d117fg2 a6fg2 a99fg2 a110fg2 d62fg3 a142fg3 disk (0x7fbf078834e0), num 86a slot 65535 fg 1 ptotal 13 pact 8 pnew 0 pdrp 5 pset dsk 86 [13]: d141fg3 d111fg2 d122fg2 a140fg3 a137fg3 a112fg2 d109fg2 a66fg3 a8fg2 d113fg2 a26fg2 a123fg3 a27fg2 disk (0x7fbf07882328), num 87a slot 65535 fg 2 ptotal 8 pact 8 pnew 0 pdrp 0 pset dsk 87 [8]: a89fg1 a139fg3 a82fg1 a143fg3 a137fg3 a145fg1 a73fg1 a141fg3 disk (0x7fbf07882390), num 88a slot 65535 fg 1 ptotal 11 pact 8 pnew 0 pdrp 3 pset dsk 88 [11]: a109fg2 d142fg3 d119fg2 a115fg2 a133fg3 d106fg2 a127fg3 a110fg2 a80fg3 a120fg2 a31fg3 disk (0x7fbf078823f8), num 89a slot 65535 fg 1 ptotal 12 pact 8 pnew 0 pdrp 4 pset dsk 89 [12]: d128fg3 d120fg2 a139fg3 a136fg3 d107fg2 d104fg2 a126fg3 a130fg3 a51fg2 a81fg2 a142fg3 a87fg2 disk (0x7fbf07882460), num 90a slot 65535 fg 1 ptotal 9 pact 8 pnew 0 pdrp 1 pset dsk 90 [9]: a139fg3 d107fg2 a136fg3 a105fg2 a111fg2 a124fg3 a137fg3 a54fg2 a53fg2 disk (0x7fbf078824c8), num 91a slot 65535 fg 1 ptotal 13 pact 8 pnew 0 pdrp 5 pset dsk 91 [13]: d117fg2 d121fg2 d139fg3 d115fg2 a132fg3 a102fg2 d123fg3 a104fg2 a143fg3 a27fg2 a57fg2 a62fg3 a33fg3 disk (0x7fbf078812a8), num 92a slot 65535 fg 1 ptotal 13 pact 7 pnew 1 pdrp 5 pset dsk 92 [13]: d105fg2 d121fg2 a139fg3 d114fg2 d128fg3 a129fg3 d109fg2 a134fg3 a8fg2 a24fg2 a61fg3 a9fg2 n67fg3 disk (0x7fbf07881310), num 93a slot 65535 fg 2 ptotal 8 pact 8 pnew 0 pdrp 0 pset dsk 93 [8]: a34fg3 a145fg1 a5fg1 a96fg1 a71fg3 a133fg3 a129fg3 a98fg1 disk (0x7fbf07881378), num 94a slot 65535 fg 1 ptotal 15 pact 8 pnew 0 pdrp 7 pset dsk 94 [15]: d103fg2 d142fg3 d117fg2 d108fg2 a130fg3 a110fg2 d131fg3 a24fg2 a69fg3 a105fg2 a28fg2 a33fg3 d71fg3 a132fg3 d138fg3 disk (0x7fbf078813e0), num 95a slot 65535 fg 1 ptotal 10 pact 8 pnew 0 pdrp 2 pset dsk 95 [10]: a135fg3 d119fg2 a102fg2 d126fg3 a106fg2 a127fg3 a35fg3 a118fg2 a64fg3 a114fg2 disk (0x7fbf07880228), num 96a slot 65535 fg 1 ptotal 13 pact 8 pnew 0 pdrp 5 pset dsk 96 [13]: d133fg3 d140fg3 d102fg2 d111fg2 a123fg3 a103fg2 a124fg3 a136fg3 a93fg2 a120fg2 d122fg2 a65fg3 a69fg3 disk (0x7fbf07880290), num 97a slot 65535 fg 1 ptotal 18 pact 8 pnew 0 pdrp 10 pset dsk 97 [18]: d110fg2 d120fg2 d132fg3 d112fg2 d133fg3 d134fg3 d114fg2 d116fg2 a137fg3 a75fg2 a127fg3 a108fg2 d76fg2 a29fg2 d64fg3 a117fg2 a15fg3 a12fg3 disk (0x7fbf0787f1a8), num 98a slot 65535 fg 1 ptotal 18 pact 8 pnew 0 pdrp 10 pset dsk 98 [18]: d129fg3 d120fg2 d123fg3 d106fg2 a127fg3 d107fg2 a135fg3 d116fg2 d16fg3 a24fg2 a128fg3 a93fg2 a8fg2 a7fg2 d33fg3 d115fg2 d17fg3 a34fg3 disk (0x7fbf0787f210), num 99a slot 65535 fg 2 ptotal 8 pact 8 pnew 0 pdrp 0 pset dsk 99 [8]: a46fg1 a142fg3 a40fg1 a128fg3 a84fg1 a143fg3 a85fg1 a140fg3 disk (0x7fbf0787f278), num 100a slot 65535 fg 1 ptotal 15 pact 8 pnew 0 pdrp 7 pset dsk 100 [15]: a125fg3 a108fg2 a129fg3 d109fg2 d130fg3 d131fg3 a133fg3 d102fg2 a81fg2 a7fg2 d122fg2 a16fg3 a76fg2 d116fg2 d61fg3 disk (0x7fbf0787f2e0), num 101a slot 65535 fg 1 ptotal 9 pact 8 pnew 0 pdrp 1 pset dsk 101 [9]: a124fg3 a104fg2 a125fg3 a126fg3 a28fg2 a68fg3 a115fg2 d107fg2 a117fg2 disk (0x7fbf0787db08), num 102a slot 65535 fg 2 ptotal 14 pact 8 pnew 0 pdrp 6 pset dsk 102 [14]: d135fg3 d96fg1 a95fg1 d134fg3 a91fg1 a127fg3 a80fg3 d100fg1 a15fg3 d61fg3 a83fg1 a44fg1 a140fg3 d142fg3 disk (0x7fbf0787db70), num 103a slot 65535 fg 2 ptotal 12 pact 8 pnew 0 pdrp 4 pset dsk 103 [12]: d143fg3 d94fg1 a96fg1 d135fg3 a132fg3 a85fg1 d136fg3 a144fg1 a82fg1 a67fg3 a42fg1 a35fg3 disk (0x7fbf0787dbd8), num 104a slot 65535 fg 2 ptotal 11 pact 8 pnew 0 pdrp 3 pset dsk 104 [11]: d133fg3 a101fg1 d140fg3 d89fg1 a126fg3 a91fg1 a144fg1 a143fg3 a31fg3 a68fg3 a142fg3 disk (0x7fbf0787ca88), num 105a slot 65535 fg 2 ptotal 10 pact 8 pnew 0 pdrp 2 pset dsk 105 [10]: d92fg1 d137fg3 a90fg1 a131fg3 a83fg1 a18fg1 a67fg3 a128fg3 a94fg1 a30fg3 disk (0x7fbf0787caf0), num 106a slot 65535 fg 2 ptotal 13 pact 7 pnew 1 pdrp 5 pset dsk 106 [13]: d143fg3 d98fg1 a95fg1 a134fg3 d88fg1 a124fg3 a45fg1 a70fg3 d64fg3 a35fg3 d146fg1 a132fg3 n47fg1 disk (0x7fbf0787cb58), num 107i slot 65535 fg 2 ptotal 8 pact 0 pnew 0 pdrp 8 pset dsk 107 [8]: d141fg3 d90fg1 d143fg3 d98fg1 d137fg3 d89fg1 d130fg3 d101fg1 disk (0x7fbf0787cbc0), num 108a slot 65535 fg 2 ptotal 13 pact 8 pnew 0 pdrp 5 pset dsk 108 [13]: a100fg1 d140fg3 d94fg1 a131fg3 a123fg3 d83fg1 a144fg1 d142fg3 a33fg3 a36fg1 a97fg1 a125fg3 d71fg3 disk (0x7fbf0787ba08), num 109a slot 65535 fg 2 ptotal 14 pact 8 pnew 0 pdrp 6 pset dsk 109 [14]: a88fg1 d100fg1 d137fg3 d92fg1 d86fg1 d129fg3 d79fg3 a20fg1 a43fg1 a130fg3 a21fg1 a66fg3 a13fg3 a147fg1 disk (0x7fbf0787ba70), num 110a slot 65535 fg 2 ptotal 12 pact 8 pnew 0 pdrp 4 pset dsk 110 [12]: d97fg1 d127fg3 d139fg3 a94fg1 a133fg3 a137fg3 d147fg1 a88fg1 a131fg3 a3fg1 a85fg1 a130fg3 disk (0x7fbf0787bad8), num 111a slot 65535 fg 2 ptotal 11 pact 8 pnew 0 pdrp 3 pset dsk 111 [11]: d86fg1 d142fg3 d96fg1 a136fg3 a90fg1 a82fg1 a125fg3 a71fg3 a41fg1 a16fg3 a33fg3 disk (0x7fbf0787bb40), num 112a slot 65535 fg 2 ptotal 11 pact 8 pnew 0 pdrp 3 pset dsk 112 [11]: d142fg3 d97fg1 a123fg3 a86fg1 d130fg3 a40fg1 a22fg1 a129fg3 a134fg3 a43fg1 a70fg3 disk (0x7fbf0787a988), num 113i slot 65535 fg 2 ptotal 9 pact 0 pnew 0 pdrp 9 pset dsk 113 [9]: d138fg3 d84fg1 d139fg3 d135fg3 d82fg1 d85fg1 d127fg3 d23fg1 d86fg1 disk (0x7fbf0787a9f0), num 114a slot 65535 fg 2 ptotal 15 pact 7 pnew 1 pdrp 7 pset dsk 114 [15]: d123fg3 d142fg3 d97fg1 d92fg1 a124fg3 a83fg1 a126fg3 a4fg1 d71fg3 d41fg1 d67fg3 a61fg3 a95fg1 a64fg3 n69fg3 disk (0x7fbf0787aa58), num 115a slot 65535 fg 2 ptotal 13 pact 8 pnew 0 pdrp 5 pset dsk 115 [13]: d82fg1 a136fg3 a88fg1 d132fg3 d91fg1 a133fg3 a101fg1 d141fg3 a5fg1 a1fg1 a61fg3 d98fg1 a137fg3 disk (0x7fbf0787aac0), num 116a slot 65535 fg 2 ptotal 13 pact 8 pnew 0 pdrp 5 pset dsk 116 [13]: d136fg3 d138fg3 a84fg1 a128fg3 d98fg1 a141fg3 d97fg1 a145fg1 a135fg3 a130fg3 a21fg1 d100fg1 a129fg3 disk (0x7fbf07879908), num 117a slot 65535 fg 2 ptotal 14 pact 8 pnew 0 pdrp 6 pset dsk 117 [14]: d91fg1 d134fg3 d135fg3 d94fg1 d125fg3 a74fg1 a73fg1 a140fg3 a101fg1 a60fg3 d85fg1 a35fg3 a17fg3 a97fg1 disk (0x7fbf07879970), num 118a slot 65535 fg 2 ptotal 13 pact 8 pnew 0 pdrp 5 pset dsk 118 [13]: d130fg3 d131fg3 d140fg3 a44fg1 d72fg1 a38fg1 a139fg3 a95fg1 a65fg3 d143fg3 a84fg1 a61fg3 a34fg3 disk (0x7fbf078799d8), num 119i slot 65535 fg 2 ptotal 8 pact 0 pnew 0 pdrp 8 pset dsk 119 [8]: d130fg3 d95fg1 d125fg3 d84fg1 d126fg3 d129fg3 d88fg1 d85fg1 disk (0x7fbf07879a40), num 120a slot 65535 fg 2 ptotal 20 pact 7 pnew 0 pdrp 13 pset dsk 120 [20]: d128fg3 d89fg1 d138fg3 d97fg1 d98fg1 d124fg3 d142fg3 d145fg1 d35fg3 d45fg1 a88fg1 d33fg3 d40fg1 a96fg1 a46fg1 a20fg1 a12fg3 a147fg1 d141fg3 a78fg3 disk (0x7fbf07879aa8), num 121a slot 65535 fg 2 ptotal 16 pact 8 pnew 0 pdrp 8 pset dsk 121 [16]: a126fg3 d85fg1 a132fg3 d91fg1 d133fg3 d92fg1 a146fg1 d16fg3 a47fg1 a141fg3 a138fg3 d12fg3 a82fg1 d142fg3 d136fg3 a69fg3 disk (0x7fbf07878888), num 122a slot 65535 fg 2 ptotal 15 pact 8 pnew 0 pdrp 7 pset dsk 122 [15]: d124fg3 a123fg3 d82fg1 d83fg1 d127fg3 d86fg1 a128fg3 a0fg1 a78fg3 a70fg3 a84fg1 d100fg1 d96fg1 a42fg1 a39fg1 disk (0x7fbf078788f0), num 123a slot 65535 fg 3 ptotal 11 pact 8 pnew 0 pdrp 3 pset dsk 123 [11]: d114fg2 a122fg2 d98fg1 a96fg1 a112fg2 d91fg1 a108fg2 a25fg2 a146fg1 a50fg2 a86fg1 disk (0x7fbf07877808), num 124a slot 65535 fg 3 ptotal 10 pact 8 pnew 0 pdrp 2 ...... ...... disk (0x7fbf07872658), num 145a slot 65535 fg 1 ptotal 13 pact 7 pnew 1 pdrp 5 pset dsk 145 [13]: a34fg3 d49fg2 d17fg3 d138fg3 d120fg2 a136fg3 a116fg2 a137fg3 d134fg3 a87fg2 a93fg2 a56fg2 n143fg3 disk (0x7fbf07871508), num 146a slot 65535 fg 1 ptotal 11 pact 8 pnew 0 pdrp 3 pset dsk 146 [11]: a79fg3 d77fg2 d35fg3 a52fg2 a55fg2 a75fg2 a131fg3 a121fg2 a123fg3 d106fg2 a32fg3 disk (0x7fbf07871570), num 147a slot 65535 fg 1 ptotal 14 pact 8 pnew 0 pdrp 6 pset dsk 147 [14]: d57fg2 d69fg3 d34fg3 a128fg3 d110fg2 d78fg3 d76fg2 a138fg3 a55fg2 a31fg3 a120fg2 a16fg3 a65fg3 a109fg2 fail (0x7fbf0ce71398), name MPC2C1 num 1 size 48 act 48 new 0 drp 0 au 20889600 ptotal 566 pact 380 pnew 4 pdrp 182 rtotal 2 ract 2 rnew 0 rdrp 0 fset (0x7fbf0ce716a8), fg: 1, tot: 2 frel (0x7fbf077102c0), fg:<1 2>, totaldp:294 actdp 191 newdp 1 drpdp 102, st A frel (0x7fbf076faf08), fg:<1 3>, totaldp:272 actdp 189 newdp 3 drpdp 80, st A disks: 0 1 2 3 4 5 18 19 20 21 22 23 36 37 38 39 40 41 42 43 44 45 46 47 72 73 74 82 83 84 85 86 88 89 90 91 92 94 95 96 97 98 100 101 144 145 146 147 fail (0x7fbf0770f860), name MPC2C2 num 2 size 52 act 48 new 0 drp 4 au 20889600 ptotal 612 pact 381 pnew 2 pdrp 229 rtotal 2 ract 2 rnew 0 rdrp 0 fset (0x7fbf07710570), fg: 2, tot: 2 frel (0x7fbf077102c0), fg:<1 2>, totaldp:294 actdp 191 newdp 1 drpdp 102, st A frel (0x7fbf076fae48), fg:<2 3>, totaldp:318 actdp 190 newdp 1 drpdp 127, st A disks: 6 7 8 9 10 11 24 25 26 27 28 29 48 49 50 51 52 53 54 55 56 57 58 59 75 76 77 81 87 93 99 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 fail (0x7fbf076fa788), name MPC2C3 num 3 size 48 act 48 new 0 drp 0 au 20889600 ptotal 590 pact 379 pnew 4 pdrp 207 rtotal 2 ract 2 rnew 0 rdrp 0 fset (0x7fbf076fb158), fg: 3, tot: 2 frel (0x7fbf076faf08), fg:<1 3>, totaldp:272 actdp 189 newdp 3 drpdp 80, st A frel (0x7fbf076fae48), fg:<2 3>, totaldp:318 actdp 190 newdp 1 drpdp 127, st A disks: 12 13 14 15 16 17 30 31 32 33 34 35 60 61 62 63 64 65 66 67 68 69 70 71 78 79 80 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 cset (0x7fbf0ce71358), total frels: 3 frel (0x7fbf077102c0), fg:<1 2>, totaldp:294 actdp 191 newdp 1 drpdp 102, st A frel (0x7fbf076faf08), fg:<1 3>, totaldp:272 actdp 189 newdp 3 drpdp 80, st A frel (0x7fbf076fae48), fg:<2 3>, totaldp:318 actdp 190 newdp 1 drpdp 127, st A kfdp_query: callcnt 1721983 grp 1 (DATAC1) NOTE: GroupBlock outside rolling migration privileged region ----- Abridged Call Stack Trace ----- ksedsts()+426<-kfnmGroupBlockGlobal()+659<-kfnmGroupBlockPriv()+318<-kfgFinalize()+334<-kfxdrvAlter()+3415<-kfxdrvEntry()+1417<-opiexe()+28735<-opiosq0()+4494<-kpooprx()+387<-kpoal8()+830<-opiodr()+1202<-ttcpip()+1222<-opitsk()+1903<-opiino()+936<-opiodr()+1202 <-opidrv()+1094<-sou2o()+165<-opimai_real()+422<-ssthrdmain()+417<-main()+256<-__libc_start_main()+245 ----- End of Abridged Call Stack Trace ----- Partial short call stack signature: 0xb0ac14de6c5e2e9c SQL> alter diskgroup DATAC1 rebalance power 6 kfgpCreate: max_fg_rel 4, max_disk_part 8 kfgpPartners: NOT appliance. kfgpPartners: max_fg_rel, max_disk_part(4, 8) has been adjusted to (3, 8) due to actual FG, disk configuration (3, 144, num_singledisk_fg 0) kfgpPartners: verifying consistency of newly formed partners. kfgpPartners: repartnering completed. kfgpGet: insufficient space provided by caller. size 21, pcnt 20, KFPTNR_MAXTOT 20 WARNING: Too many uncompleted reconfigurations. Rebalance needs completion. kfgp (0x7fb69d5f2910), allow quorum: 0, total disks: 148, FGs: total 3 active 3 normal 3 active quorum 0, max dsknum: 147, maxfgnum: 3 scores=55296 ties=9696 add=576 insert=0 replace=0 disk (0x7fb69d5f1c90), num 0a slot 65535 fg 1 ptotal 8 pact 0 pnew 8 pdrp 0 |
从第一次的trace来看,oracle asm提示相关disk pst partner信息有问题;因此我们使用了level 0x39 进行了pst partner关系的重建。但是发现仍然无法解决问题,后面再报kfgpGet: insufficient space provided by caller. size 21, pcnt 20, KFPTNR_MAXTOT 20。
针对该问题,我在我们内部测试环境进行了相关模拟,通过频繁offline、drop disk然后add disk,在磁盘操作过程后,多次进行rebalance power的修改;大约测试了不下10次,最终遇到了一个未知的错误:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 |
SQL> alter diskgroup dg_data01 drop disk DG_DATA01_0135 force 2021-07-16T17:04:25.492886+08:00 NOTE: cache closing disk 139 of grp 1: (not open) _DROPPED_0139_DG_DATA01 NOTE: GroupBlock outside rolling migration privileged region NOTE: full repartnering enabled for group 1 by test event 15195 level 0x39 Errors in file /u01/app/grid/diag/asm/+asm/+ASM1/trace/+ASM1_ora_160232.trc (incident=40010): ORA-00600: internal error code, arguments: [kfgCanRepartner01], [2], [3], [6], [], [], [], [], [], [], [], [] Incident details in: /u01/app/grid/diag/asm/+asm/+ASM1/incident/incdir_40010/+ASM1_ora_160232_i40010.trc 2021-07-16T17:04:26.506682+08:00 Use ADRCI or Support Workbench to package the incident. See Note 411.1 at My Oracle Support for error and packaging details. 2021-07-16T17:04:26.506873+08:00 ORA-00600: internal error code, arguments: [kfgCanRepartner01], [2], [3], [6], [], [], [], [], [], [], [], [] 2021-07-16T17:04:26.506954+08:00 ERROR: alter diskgroup dg_data01 drop disk DG_DATA01_0135 force 2021-07-16T17:04:26.509210+08:00 SQL> alter diskgroup dg_data01 drop disk DG_DATA01_0134 force 2021-07-16T17:04:26.509917+08:00 NOTE: cache closing disk 139 of grp 1: (not open) _DROPPED_0139_DG_DATA01 NOTE: GroupBlock outside rolling migration privileged region NOTE: full repartnering enabled for group 1 by test event 15195 level 0x39 2021-07-16T17:04:26.584378+08:00 Dumping diagnostic data in directory=[cdmp_20210716170426], requested by (instance=1, osid=160232), summary=[incident=40010]. Errors in file /u01/app/grid/diag/asm/+asm/+ASM1/trace/+ASM1_ora_160232.trc (incident=40011): ORA-00600: internal error code, arguments: [kfgCanRepartner01], [3], [1], [9], [], [], [], [], [], [], [], [] Incident details in: /u01/app/grid/diag/asm/+asm/+ASM1/incident/incdir_40011/+ASM1_ora_160232_i40011.trc |
上述错误之前从未遇见过,可见Oracle 19c 版本中,对于ASM 的管理仍然存在一些不足之处;频繁的进行disk drop、add操作;在rebalance没有完成之前,是可能引发一些问题的,不过从测试来看,19c版本相比11.2.0.4版本,ASM 相关检测机制更加完善了,也更加健壮了一些。
再回到本次的案例。在一筹莫展之际,某天晚上,该用户环境其中一个存储节点磁盘被offline,通过online激活后,竟然发现磁盘组rebalance操作可以正常进行了。为此我进行了进一步跟踪分析,如下是此次磁盘offline涉及到的相关disk:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 |
*** 2021-07-23T05:41:27.626857+08:00 NOTE: initiating PST update: grp 1 (DATAC1), dsk = 82/0x0, mask = 0x7f, op = assign mandatory NOTE: initiating PST update: grp 1 (DATAC1), dsk = 88/0x0, mask = 0x7f, op = assign mandatory NOTE: initiating PST update: grp 1 (DATAC1), dsk = 94/0x0, mask = 0x7f, op = assign mandatory NOTE: initiating PST update: grp 1 (DATAC1), dsk = 100/0x0, mask = 0x7f, op = assign mandatory kfdp_updateDsk(): callcnt 1766027 grp 1 PST verChk -0: req, id=3197182789, grp=1, requested=146 at 07/23/2021 05:41:27 NOTE: PST update grp = 1 completed successfully NOTE: kfdsFilter_freeDskSrSlice for Filter 0x7ff72009cfd0 NOTE: kfdsFilter_clearDskSlice for Filter 0x7ff72009cfd0 (all:TRUE) NOTE: completed online of disk group 1 disks DATAC1_0082 (82) DATAC1_0088 (88) DATAC1_0094 (94) DATAC1_0100 (100) ARB0 relocating file +DATAC1.1.1 reason 6 (1 entries first xnum 0x1) ARB0 relocating file +DATAC1.3.1 reason 6 (9 entries first xnum 0x3) |
我们发现一共涉及到4个disk,分别是82/88/94/100 4个disk。从前面的trace 我们知道,之前无法进行rebalance的原因主要是卡在了disk 120上,且Oracle提示该disk pst的slot 已达到最大值,实际上通过kfed分析发现该结构最大就是20.
那么为什么巧合之际有4个盘被offline、online之后,整个diskgroup rebalance操作就恢复正常了呢?
最终我们分析发现此次offline操作的4个盘之一是88,其中该磁盘正好是120 disk的partner。我们认为offline 操作后,最终使oracle跳过了针对disk 120的一致性检查。
从这里看,我们之前给用户提供的解决方案也是符合的:
1、offline disk 120;然后online(offline、online过程不会除非rebalance,在disk repair time之内)
2、drop 120 disk force;然后手工执行rebalance。
这个案例相对比较有意思,特此简单记录一下。比较特殊的是该diskgroup 比较大,大概250TB,因为操作比较慎重。
Leave a Reply
You must be logged in to post a comment.