rman delete archivelog with error ora-15028
本站文章除注明转载外,均为本站原创: 转载自love wife love life —Roger的Oracle/MySQL/PostgreSQL数据恢复博客
近期好几个都遇到了rman在删除归档时候报错ora-15028,即文件被占用,不允许被删除。实际上该问题的分析过程并不复杂,
Oracle 模式也提供了相关的诊断手段,大致如下:
1 2 3 4 5 6 7 8 9 10 11 12 |
For RAC (connect to one of the RAC instances and execute): $> sqlplus /nolog SQL> connect / as sysdba (sysasm if connecting to an ASM instance) SQL> oradebug setmypid SQL> oradebug unlimit REM : The next line should give something like Hang Analysis in $ORACLE_BASE/diag/.../trace/$ORACLE_SID_diag_<pid>.trc. Upload this REM : Run the following two lines on one instance 2-3 times - 1 minute apart: SQL> oradebug -g all hanganalyze 3 SQL> oradebug -g all dump systemstate 266 REM : The following line will print the location for the systemstate trace. Upload this SQL> oradebug tracefile_name REM : Also upload the instance alert log. |
大致思路是做local或者系统级别的systemstate dump,或者针对操作session做15028 errorstack。本质上两种方式是一样的。
报错过程中,也可能出现ora-00600 kfncSlaveMsgFree1 错误:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 |
----- Invocation Context Dump ----- Address: 0x7fb9ff3a7a90 Phase: 3 flags: 0x18E0000 Incident ID: 242170 Error Descriptor: ORA-600 [kfncSlaveMsgFree1] [0] [0] [0] [] [] [] [] [] [] [] [] Error class: 0 Problem Key # of args: 1 Number of actions: 11 ----- Incident Context Dump ----- Address: 0x7ffe07cfb7a0 Incident ID: 242170 Problem Key: ORA 600 [kfncSlaveMsgFree1] Error: ORA-600 [kfncSlaveMsgFree1] [0] [0] [0] [] [] [] [] [] [] [] [] [00]: dbgexProcessError [diag_dde] [01]: dbgeExecuteForError [diag_dde] [02]: dbgePostErrorKGE [diag_dde] [03]: dbkePostKGE_kgsf [rdbms_dde] [04]: kgeadse [] [05]: kgerinv_internal [] [06]: kgerinv [] [07]: kgeasnmierr [] [08]: kfncSlaveMsgFree [KFNC]<-- Signaling [09]: kfncFileDelete [KFNC] [10]: kfioDelete [ASM] [11]: ksfddel1 [VOS] [12]: ksfddel [VOS] [13]: krbmdal [] [14]: krbidrl [] [15]: pevm_icd_call_common [PLSQL_Code_Execution] [16]: pfrinstr_ICAL [PLSQL_Code_Execution] [17]: pfrrun_no_tool [PLSQL_Code_Execution] [18]: pfrrun [PLSQL_Code_Execution] [19]: plsql_run [PLSQL_PSD_Standalones] [20]: pricar [PLSQL_Code_Execution] [21]: pricbr [PLSQL_Code_Execution] [22]: prient2 [PLSQL_Code_Execution] [23]: prient [PLSQL_Code_Execution] [24]: kkxrpc [] [25]: kporpc [] [26]: opiodr [] [27]: ttcpip [] [28]: opitsk [] [29]: opiino [] [30]: opiodr [] [31]: opidrv [] [32]: sou2o [] [33]: opimai_real [] [34]: ssthrdmain [] [35]: main [] [36]: __libc_start_main [] MD [00]: 'SID'='2120.34829' (0x2) MD [01]: 'ProcId'='271.216' (0x2) |
当我们难道systemstate dump之后,直接搜索报错的归档文件即可,这里以1_370640_926351915.arc 为例:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 |
PROCESS 123: ---------------------------------------- SO: 0x2281f43940, type: 2, owner: (nil), flag: INIT/-/-/0x00 if: 0x3 c: 0x3 proc=0x2281f43940, name=process, file=ksu.h LINE:12721, pg=0 (process) Oracle pid:123, ser:133, calls cur/top: (nil)/0x21d8ecb010 flags : (0x0) - flags2: (0x0), flags3: (0x10) intr error: 0, call error: 0, sess error: 0, txn error 0 intr queue: empty ksudlp FALSE at location: 0 (post info) last post received: 0 0 80 last post received-location: kji.h LINE:3691 ID:kjata: wake up enqueue owner last process to post me: 0x2341fd6c90 1 6 last post sent: 0 0 42 last post sent-location: ksv2.h LINE:1652 ID:ksvpst: checkpool last process posted by me: 0x2342076a88 1 2 (latch info) wait_event=0 bits=0x0 Process Group: DEFAULT, pseudo proc: 0x22a2071140 O/S info: user: grid, term: UNKNOWN, ospid: 147794 OSD pid info: Unix process pid: 147794, image: oracle@xxxxx ---------------------------------------- SO: 0x232db799e8, type: 9, owner: 0x2281f43940, flag: INIT/-/-/0x00 if: 0x3 c: 0x3 proc=0x2281f43940, name=FileIdentificatonBlock, file=ksfd2.h LINE:409, pg=0 (FIB) 0x232db799e8 flags=65728 reference cnt=1 incno=15624 seqno=1 fname=+DG_FRA/cbsdb_dc02/archivelog/2_282778_926351915.arc fno=0 lblksz=512 fsiz=7043502 ############# kfiofib = 0x232db79ab0 ################# Diskgroup Name = DG_FRA File number = 2802.1119830675 File type = 3 Flags = 0x8 Blksize = 512 File size = 7043503 blocks Blk one offset = 1 Redundancy = 17 Physical blocksz = 512 Open name = +DG_FRA/xxxxx_dc02/archivelog/2_282778_926351915.arc Fully-qualified nm = +DG_FRA/xxxxx_dc02/archivelog/2_282778_926351915.arc Mapid = 2994909 Slave ID = -1 Connection = 0x(nil) Extent counts/sizes= 4294967295 1X, 0 1X, 0 1X Stripe width=1 size=1048576 blks=2048 setblks=2048 AU blocks=2048 Extent blocks= 2048, 2048, 2048 Blocks before each break = 4294967295, 0 ############################################ ---------------------------------------- SO: 0x22ae8e46a0, type: 10, owner: 0x232db799e8, flag: INIT/-/-/0x00 if: 0x1 c: 0x1 proc=0x2281f43940, name=FileOpenBlock, file=ksfd.h LINE:6688, pg=0 (FOB) 0x22ae8e46a0 flags=2570 fib=0x232db799e8 incno=15624 pending i/o cnt=0 fname=+DG_FRA/xxxxx_dc02/archivelog/2_282778_926351915.arc fno=0 lblksz=512 fsiz=7043502 ---------------------------------------- SO: 0x232db76268, type: 9, owner: 0x2281f43940, flag: INIT/-/-/0x00 if: 0x3 c: 0x3 proc=0x2281f43940, name=FileIdentificatonBlock, file=ksfd2.h LINE:409, pg=0 (FIB) 0x232db76268 flags=65728 reference cnt=1 incno=52396 seqno=1 fname=+DG_FRA/xxxxx_dc02/archivelog/1_370640_926351915.arc fno=0 lblksz=512 fsiz=7024548 ############# kfiofib = 0x232db76330 ################# Diskgroup Name = DG_FRA File number = 6109.1119830681 File type = 3 Flags = 0x8 Blksize = 512 File size = 7024549 blocks Blk one offset = 1 Redundancy = 17 Physical blocksz = 512 Open name = +DG_FRA/xxxxx_dc02/archivelog/1_370640_926351915.arc Fully-qualified nm = +DG_FRA/xxxxx_dc02/archivelog/1_370640_926351915.arc Mapid = 2994906 Slave ID = -1 Connection = 0x(nil) Extent counts/sizes= 4294967295 1X, 0 1X, 0 1X Stripe width=1 size=1048576 blks=2048 setblks=2048 AU blocks=2048 Extent blocks= 2048, 2048, 2048 Blocks before each break = 4294967295, 0 ############################################ ---------------------------------------- SO: 0x22ae8e23e8, type: 10, owner: 0x232db76268, flag: INIT/-/-/0x00 if: 0x1 c: 0x1 proc=0x2281f43940, name=FileOpenBlock, file=ksfd.h LINE:6688, pg=0 (FOB) 0x22ae8e23e8 flags=2570 fib=0x232db76268 incno=52396 pending i/o cnt=1 fname=+DG_FRA/xxxxx_dc02/archivelog/1_370640_926351915.arc fno=0 lblksz=512 fsiz=7024548 ---------------------------------------- SO: 0x21f916e4f8, type: 125, owner: 0x2281f43940, flag: INIT/-/-/0x00 if: 0x3 c: 0x3 proc=0x2281f43940, name=KFG state obj, file=kfg2.h LINE:1289, pg=0 (kfgso) flags: 00000000 clt: 1 err: 0 hint: b98e0538 (kfgpn) rpi: 2451 itrn:(nil) gst:(nil) usrp:(nil) busy: 0 rep: 0 grp: 0 check: 0/0 glink: 0x21f916e5b8 0x21f916e5b8 ---------------------------------------- |
定位到归档名称之后,跟进proc 我们可以直接反向定位到持有该文件的process是 123,也就自然能看到Unix pid信息。
如果继续进行搜索可能还能看到进程具体在干什么,例如这里发现该进程正在进行logminer分析:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 |
------------------------------------------------------------------------------- Chain 22: ------------------------------------------------------------------------------- Oracle session identified by: { instance: 1 (xxxxx_dc02.xxxxx1) os id: 147794 process id: 123, oracle@xxxxx session id: 2342 session serial #: 36357 } is not in a wait: { last wait: 2.202378 sec ago blocking: 0 sessions current sql: SELECT SCN, USERNAME, OPERATION_CODE, TIMESTAMP, SQL_REDO, TABLE_NAME, COMMIT_SCN, SEQUENCE#, CSF, XIDUSN, XIDSLT, XIDSQN, RS_ID, SSN, SEG_OWNER, ROLLBACK, ROW_ID FROM V$LOGMNR_CONTENTS WHERE ((( (SEG_OWNER='ECIF' AND TABLE_NAME IN ('ECIF_ADDR_PHONE')) ) AND (OPERATION_CODE IN (1,3,2))) OR (OPERATION_CODE = 7 OR OPE short stack: ksedsts()+465<-ksdxfstk()+32<-ksdxcb()+1927<-sspuser()+112<-__sighandler()<-krvucfth()+434<-krvtpdc_ProcessDataColumns()+265<-krvtfdc()+792<-kdofdc()+206<-kdodpm()+1341<-kcocdm()+186<-krvtprpr()+458<-krvxptr_TranslateRegion()+142<-krvxprep()+371<-krvxgtsp_GetTxnSingleProcess()+115<-krvxgt()+1044<-krvfcact2()+1806<-krvfcact()+684<-qerfxFetch()+890<-opifch2()+2766<-kpoal8()+8432<-opiodr()+917<-ttcpip()+2183<-opitsk()+1710<-opiino()+969<-opiodr()+917<-opidrv()+570<-sou2o()+103<-opimai_real()+133<-ssthrdmain()+265<-ma wait history: |
到这里基本上就很清晰了,既然有业务会话在不断进行logminer分析,那么肯定是需要用到归档文件的,当session操作未完成之前,如果进行delete archivelog操作,肯定是会报错ora-15028的,这是一个正常的逻辑。
另外用户这里是dataguard环境,建议也把rman中关于archivelog的删除保留策略定义为如下方式:
CONFIGURE ARCHIVELOG DELETION POLICY TO SHIPPED ON STANDBY;
ok,暂时就到这里吧,问题非常简单,就做个随笔记录一下。
Leave a Reply
You must be logged in to post a comment.