datafile auto offlile due to i/o error
本站文章除注明转载外,均为本站原创: 转载自love wife love life —Roger的Oracle/MySQL/PostgreSQL数据恢复博客
刚到酒店,就接到客户电话说某数据库的一个数据文件报IO错误,通过vpn查看发现如下:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 |
Sun Oct 30 23:19:27 BEIST 2016 Trace dumping is performing id=[cdmp_20161030231927] Sun Oct 30 23:19:27 BEIST 2016 Errors in file /oracle/app/10.2/admin/xxxx/bdump/xxxx2_smon_11863216.trc: ORA-00376: file 595 cannot be read at this time ORA-01110: data file 595: '/dev/rdata05vg_8g_48' Sun Oct 30 23:19:31 BEIST 2016 ORACLE Instance xxxx2 (pid = 22) - Error 376 encountered while recovering transaction (160, 1). Sun Oct 30 23:19:31 BEIST 2016 Errors in file /oracle/app/10.2/admin/xxxx/bdump/xxxx2_smon_11863216.trc: ORA-00376: file 595 cannot be read at this time ORA-01110: data file 595: '/dev/rdata05vg_8g_48' Sun Oct 30 23:19:32 BEIST 2016 Errors in file /oracle/app/10.2/admin/xxxx/bdump/xxxx2_smon_11863216.trc: ORA-00376: file 595 cannot be read at this time ORA-01110: data file 595: '/dev/rdata05vg_8g_48' Sun Oct 30 23:19:33 BEIST 2016 |
我们不难看出,报错文件无法读;实际上我登录2个节点ls -tr检查发现权限都是正确的,同时通过dbv 检查该文件发现也无坏块;因此我断定直接recover该文件即可。在recover时,发现居然报错nfs挂载有问题:
1 2 3 4 5 6 7 8 9 10 |
SQL> recover datafile 595; ORA-00279: change 15125505612642 generated at 10/30/2016 18:06:07 needed for thread 1 ORA-00289: suggestion : /arch2/1_108445_815539661.dbf ORA-00280: change 15125505612642 for thread 1 is in sequence #108445 Specify log: {<RET>=suggested | filename | AUTO | CANCEL} /arch1/1_108445_815539661.dbf ORA-00308: cannot open archived log '/arch1/1_108445_815539661.dbf' ORA-27054: NFS file system where the file is created or resides is not mounted with correct options Additional information: 6 |
由此可见该环境问题还不少。既然本地节点无法读取,为了短时间内恢复正常,直接将部分归档cp到相应的归档目录中,再次进行recover即可成功online该文件,如下:
1 2 3 4 5 6 7 8 9 |
Sun Oct 30 23:20:23 BEIST 2016 alter database datafile 595 online Sun Oct 30 23:20:23 BEIST 2016 Completed: alter database datafile 595 online Sun Oct 30 23:20:28 BEIST 2016 SMON: Parallel transaction recovery tried Sun Oct 30 23:23:02 BEIST 2016 Thread 2 advanced to log sequence 164113 (LGWR switch) Current log# 7 seq# 164113 mem# 0: /dev/rora_redo2_01 |
我们可以看到,确实顺利online文件了,还好是归档的数据库。那么我们继续来分析一下,为什么会出现这个问题呢? 进一步搜索alert log发现该在下午18:15分开始出现I欧错误:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 |
Sun Oct 30 18:15:04 BEIST 2016 KCF: write/open error block=0x29790 online=1 file=595 /dev/rdata05vg_8g_48 error=27063 txt: 'IBM AIX RISC System/6000 Error: 5: I/O error Additional information: -1 Additional information: 131072' Automatic datafile offline due to write error on file 595: /dev/rdata05vg_8g_48 Sun Oct 30 18:15:28 BEIST 2016 Thread 2 advanced to log sequence 164100 (LGWR switch) Current log# 12 seq# 164100 mem# 0: /dev/rora_redo2_06 Sun Oct 30 18:15:28 BEIST 2016 Errors in file /oracle/app/10.2/admin/xxxx/udump/xxxx2_ora_28705020.trc: ORA-00372: file 595 cannot be modified at this time ORA-01110: data file 595: '/dev/rdata05vg_8g_48' ORA-00372: file 595 cannot be modified at this time ORA-01110: data file 595: '/dev/rdata05vg_8g_48' |
我们可以看到,因为出现错误,Oracle自动将数据文件offline了,这其实是数据库的一直保护机制(没有相关隐含参数来控制)。到这里我怀疑多半是操作系统哪儿出问题了,果然errpt 查看发现在18:15出现了path error错误。
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 |
oracle:xxx$(/oracle)errpt IDENTIFIER TIMESTAMP T C RESOURCE_NAME DESCRIPTION 3D32B80D 1030181516 P S topsvcs NIM thread blocked 3D32B80D 1030181516 P S topsvcs NIM thread blocked E86653C3 1030181516 P H LVDD I/O ERROR DETECTED BY LVM B6267342 1030181516 P H hdisk46 DISK OPERATION ERROR DE3B8540 1030181516 P H hdisk46 PATH HAS FAILED DE3B8540 1030181416 P H hdisk46 PATH HAS FAILED oracle:xxxx$(/oracle/app/10.2/admin/xxxx/bdump)errpt -aj DE3B8540 --------------------------------------------------------------------------- LABEL: SC_DISK_ERR7 IDENTIFIER: DE3B8540 Date/Time: Sun Oct 30 18:15:00 BEIST 2016 Sequence Number: 921 Machine Id: 00F7A4904C00 Node Id: sti50l02 Class: H Type: PERM WPAR: Global Resource Name: hdisk46 Resource Class: disk Resource Type: Hitachi ...... ...... Description PATH HAS FAILED Probable Causes ADAPTER HARDWARE OR CABLE DASD DEVICE Failure Causes UNDETERMINED Recommended Actions PERFORM PROBLEM DETERMINATION PROCEDURES CHECK PATH Detail Data PATH ID |
不难看出,由于下午出现了相关错误,导致数据库出现了IO异常,oracle自动将文件offline了。然而我在刚刚lspath检查发现都ok,实际上也应该这样,否则recover datafile还会继续报IO错误。夜深了,到这里结束吧!简单记录一下!
PS:
1) NFS挂载的相关参数说明
Operating System | Mount options for Binaries ## | Mount options for Oracle Datafiles | Mount options for CRS Voting Disk and OCR |
Sun Solaris * | rw,bg,hard,nointr,rsize=32768, wsize=32768,proto=tcp,noac, vers=3,suid |
rw,bg,hard,nointr,rsize=32768, wsize=32768,proto=tcp,noac, forcedirectio, vers=3 |
rw,bg,hard,nointr,rsize=32768, wsize=32768,proto=tcp,vers=3, noac,forcedirectio |
AIX (5L) ** | rw,bg,hard,nointr,rsize=32768, wsize=32768,proto=tcp, vers=3,timeo=600 |
cio,rw,bg,hard,nointr,rsize=32768, wsize=32768,proto=tcp,noac, vers=3,timeo=600 |
cio,rw,bg,hard,intr,rsize=32768, wsize=32768,tcp,noac, vers=3,timeo=600 |
HPUX 11.23 *** — | rw,bg,vers=3,proto=tcp,noac, hard,nointr,timeo=600, rsize=32768,wsize=32768,suid |
rw,bg,vers=3,proto=tcp,noac, forcedirectio,hard,nointr,timeo=600, rsize=32768,wsize=32768 |
rw,bg,vers=3,proto=tcp,noac, forcedirectio,hard,nointr,timeo=600 ,rsize=32768,wsize=32768 |
Windows | Not Supported | Not Supported | Not Supported |
Linux x86 # **** |
rw,bg,hard,nointr,rsize=32768, wsize=32768,tcp, vers=3, timeo=600, actimeo=0 |
rw,bg,hard,nointr,rsize=32768, wsize=32768,tcp,actimeo=0, vers=3,timeo=600 |
rw,bg,hard,nointr,rsize=32768, wsize=32768,tcp,noac,actimeo=0, vers=3,timeo=600 |
Linux x86-64 # **** |
rw,bg,hard,nointr,rsize=32768, wsize=32768,tcp,vers=3, timeo=600, actimeo=0 |
rw,bg,hard,nointr,rsize=32768, wsize=32768,tcp,actimeo=0, vers=3,timeo=600 |
rw,bg,hard,nointr,rsize=32768, wsize=32768,tcp,noac,vers=3, timeo=600,actimeo=0 |
Linux – Itanium | rw,bg,hard,nointr,rsize=32768, wsize=32768,tcp,vers=3, timeo=600, actimeo=0 |
rw,bg,hard,nointr,rsize=32768, wsize=32768,tcp,actimeo=0, vers=3,timeo=600 |
rw,bg,hard,nointr,rsize=32768, wsize=32768,tcp,noac,vers=3, timeo=600,actimeo=0 |
* NFS mount option “forcedirectio” is required on Solaris platforms when mounting the OCR/CRS files when using Oracle 10.1.0.4 or 10.2.0.2 or later (Oracle unpublished bug 4466428)
** AIX is only supported with NAS on AIX 5.3 TL04 and higher with Oracle 10.2.0.1 and later
*** NAS devices are only supported with HPUX 11.23 or higher ONLY
# These mount options are for Linux kernels 2.6 and above. For older kernels please check Note 279393.1
## The stated mount options for binaries are applicable only if the ORACLE HOME is shared.
filesystemio_options = DIRECTIO
2) 如果是Oracle 11.2.0.2版本开始,在没有安装Patch 7691270的情况之下,Oracle在遇到IO错误之后,会自动将数据库crash掉,其中有个相关的隐含参数:_datafile_write_errors_crash_instance
该参数在11.2.0.2版本之后默认为true,包含最新的11.2.0.4版本。
1 2 3 4 5 6 7 |
Enter value for par: datafile_write old 3: WHERE x.indx = y.indx AND x.ksppinm LIKE '%&par%' new 3: WHERE x.indx = y.indx AND x.ksppinm LIKE '%datafile_write%' NAME VALUE DESCRIB -------------------------------------------------- -------------------- ------------------------------------------------------------ _datafile_write_errors_crash_instance TRUE datafile write errors crash instance |
Leave a Reply
You must be logged in to post a comment.