datafile auto offlile due to i/o error

本站文章除注明转载外，均为本站原创： 转载自love wife love life —Roger的Oracle/MySQL/PostgreSQL数据恢复博客

本文链接地址: datafile auto offlile due to i/o error

刚到酒店，就接到客户电话说某数据库的一个数据文件报IO错误，通过vpn查看发现如下：

Sun Oct 30 23:19:27 BEIST 2016
Trace dumping is performing id=[cdmp_20161030231927]
Sun Oct 30 23:19:27 BEIST 2016
Errors in file /oracle/app/10.2/admin/xxxx/bdump/xxxx2_smon_11863216.trc:
ORA-00376: file 595 cannot be read at this time
ORA-01110: data file 595: '/dev/rdata05vg_8g_48'
Sun Oct 30 23:19:31 BEIST 2016
ORACLE Instance xxxx2 (pid = 22) - Error 376 encountered while recovering transaction (160, 1).
Sun Oct 30 23:19:31 BEIST 2016
Errors in file /oracle/app/10.2/admin/xxxx/bdump/xxxx2_smon_11863216.trc:
ORA-00376: file 595 cannot be read at this time
ORA-01110: data file 595: '/dev/rdata05vg_8g_48'
Sun Oct 30 23:19:32 BEIST 2016
Errors in file /oracle/app/10.2/admin/xxxx/bdump/xxxx2_smon_11863216.trc:
ORA-00376: file 595 cannot be read at this time
ORA-01110: data file 595: '/dev/rdata05vg_8g_48'
Sun Oct 30 23:19:33 BEIST 2016

Sun Oct 30 23:19:27 BEIST 2016

Trace dumping is performing id=[cdmp_20161030231927]

Sun Oct 30 23:19:27 BEIST 2016

Errors in file /oracle/app/10.2/admin/xxxx/bdump/xxxx2_smon_11863216.trc:

ORA-00376: file 595 cannot be read at this time

ORA-01110: data file 595: '/dev/rdata05vg_8g_48'

Sun Oct 30 23:19:31 BEIST 2016

ORACLE Instance xxxx2 (pid = 22) - Error 376 encountered while recovering transaction (160, 1).

Sun Oct 30 23:19:31 BEIST 2016

Errors in file /oracle/app/10.2/admin/xxxx/bdump/xxxx2_smon_11863216.trc:

ORA-00376: file 595 cannot be read at this time

ORA-01110: data file 595: '/dev/rdata05vg_8g_48'

Sun Oct 30 23:19:32 BEIST 2016

Errors in file /oracle/app/10.2/admin/xxxx/bdump/xxxx2_smon_11863216.trc:

ORA-00376: file 595 cannot be read at this time

ORA-01110: data file 595: '/dev/rdata05vg_8g_48'

Sun Oct 30 23:19:33 BEIST 2016

我们不难看出，报错文件无法读；实际上我登录2个节点ls -tr检查发现权限都是正确的，同时通过dbv 检查该文件发现也无坏块；因此我断定直接recover该文件即可。在recover时，发现居然报错nfs挂载有问题：

SQL&gt; recover datafile 595;
ORA-00279: change 15125505612642 generated at 10/30/2016 18:06:07 needed for thread 1
ORA-00289: suggestion : /arch2/1_108445_815539661.dbf
ORA-00280: change 15125505612642 for thread 1 is in sequence #108445
Specify log: {&lt;RET&gt;=suggested | filename | AUTO | CANCEL}
/arch1/1_108445_815539661.dbf
ORA-00308: cannot open archived log '/arch1/1_108445_815539661.dbf'
ORA-27054: NFS file system where the file is created or resides is not mounted with correct options
Additional information: 6

SQL> recover datafile 595;

ORA-00279: change 15125505612642 generated at 10/30/2016 18:06:07 needed for thread 1

ORA-00289: suggestion : /arch2/1_108445_815539661.dbf

ORA-00280: change 15125505612642 for thread 1 is in sequence #108445

Specify log: {<RET>=suggested | filename | AUTO | CANCEL}

/arch1/1_108445_815539661.dbf

ORA-00308: cannot open archived log '/arch1/1_108445_815539661.dbf'

ORA-27054: NFS file system where the file is created or resides is not mounted with correct options

Additional information: 6

由此可见该环境问题还不少。既然本地节点无法读取，为了短时间内恢复正常，直接将部分归档cp到相应的归档目录中，再次进行recover即可成功online该文件，如下：

Sun Oct 30 23:20:23 BEIST 2016
alter database datafile 595 online
Sun Oct 30 23:20:23 BEIST 2016
Completed: alter database datafile 595 online
Sun Oct 30 23:20:28 BEIST 2016
SMON: Parallel transaction recovery tried
Sun Oct 30 23:23:02 BEIST 2016
Thread 2 advanced to log sequence 164113 (LGWR switch)
Current log# 7 seq# 164113 mem# 0: /dev/rora_redo2_01

Sun Oct 30 23:20:23 BEIST 2016

alter database datafile 595 online

Sun Oct 30 23:20:23 BEIST 2016

Completed: alter database datafile 595 online

Sun Oct 30 23:20:28 BEIST 2016

SMON: Parallel transaction recovery tried

Sun Oct 30 23:23:02 BEIST 2016

Thread 2 advanced to log sequence 164113 (LGWR switch)

Current log# 7 seq# 164113 mem# 0: /dev/rora_redo2_01

我们可以看到，确实顺利online文件了，还好是归档的数据库。那么我们继续来分析一下，为什么会出现这个问题呢？进一步搜索alert log发现该在下午18：15分开始出现I欧错误：

Sun Oct 30 18:15:04 BEIST 2016
KCF: write/open error block=0x29790 online=1
file=595 /dev/rdata05vg_8g_48
error=27063 txt: 'IBM AIX RISC System/6000 Error: 5: I/O error
Additional information: -1
Additional information: 131072'
Automatic datafile offline due to write error on
file 595: /dev/rdata05vg_8g_48
Sun Oct 30 18:15:28 BEIST 2016
Thread 2 advanced to log sequence 164100 (LGWR switch)
Current log# 12 seq# 164100 mem# 0: /dev/rora_redo2_06
Sun Oct 30 18:15:28 BEIST 2016
Errors in file /oracle/app/10.2/admin/xxxx/udump/xxxx2_ora_28705020.trc:
ORA-00372: file 595 cannot be modified at this time
ORA-01110: data file 595: '/dev/rdata05vg_8g_48'
ORA-00372: file 595 cannot be modified at this time
ORA-01110: data file 595: '/dev/rdata05vg_8g_48'

Sun Oct 30 18:15:04 BEIST 2016

KCF: write/open error block=0x29790 online=1

file=595 /dev/rdata05vg_8g_48

error=27063 txt: 'IBM AIX RISC System/6000 Error: 5: I/O error

Additional information: -1

Additional information: 131072'

Automatic datafile offline due to write error on

file 595: /dev/rdata05vg_8g_48

Sun Oct 30 18:15:28 BEIST 2016

Thread 2 advanced to log sequence 164100 (LGWR switch)

Current log# 12 seq# 164100 mem# 0: /dev/rora_redo2_06

Sun Oct 30 18:15:28 BEIST 2016

Errors in file /oracle/app/10.2/admin/xxxx/udump/xxxx2_ora_28705020.trc:

ORA-00372: file 595 cannot be modified at this time

ORA-01110: data file 595: '/dev/rdata05vg_8g_48'

ORA-00372: file 595 cannot be modified at this time

ORA-01110: data file 595: '/dev/rdata05vg_8g_48'

我们可以看到，因为出现错误，Oracle自动将数据文件offline了，这其实是数据库的一直保护机制（没有相关隐含参数来控制）。到这里我怀疑多半是操作系统哪儿出问题了，果然errpt 查看发现在18：15出现了path error错误。

oracle:xxx$(/oracle)errpt
IDENTIFIER TIMESTAMP  T C RESOURCE_NAME  DESCRIPTION
3D32B80D   1030181516 P S topsvcs        NIM thread blocked
3D32B80D   1030181516 P S topsvcs        NIM thread blocked
E86653C3   1030181516 P H LVDD           I/O ERROR DETECTED BY LVM
B6267342   1030181516 P H hdisk46        DISK OPERATION ERROR
DE3B8540   1030181516 P H hdisk46        PATH HAS FAILED
DE3B8540   1030181416 P H hdisk46        PATH HAS FAILED
oracle:xxxx$(/oracle/app/10.2/admin/xxxx/bdump)errpt -aj DE3B8540
---------------------------------------------------------------------------
LABEL:          SC_DISK_ERR7
IDENTIFIER:     DE3B8540
Date/Time:       Sun Oct 30 18:15:00 BEIST 2016
Sequence Number: 921
Machine Id:      00F7A4904C00
Node Id:         sti50l02
Class:           H
Type:            PERM
WPAR:            Global
Resource Name:   hdisk46
Resource Class:  disk
Resource Type:   Hitachi
......
......
Description
PATH HAS FAILED
Probable Causes
ADAPTER HARDWARE OR CABLE
DASD DEVICE
Failure Causes
UNDETERMINED
Recommended Actions
PERFORM PROBLEM DETERMINATION PROCEDURES
CHECK PATH
Detail Data
PATH ID

oracle:xxx$(/oracle)errpt

IDENTIFIER TIMESTAMP T C RESOURCE_NAME DESCRIPTION

3D32B80D 1030181516 P S topsvcs NIM thread blocked

E86653C3 1030181516 P H LVDD I/O ERROR DETECTED BY LVM

B6267342 1030181516 P H hdisk46 DISK OPERATION ERROR

DE3B8540 1030181516 P H hdisk46 PATH HAS FAILED

DE3B8540 1030181416 P H hdisk46 PATH HAS FAILED

oracle:xxxx$(/oracle/app/10.2/admin/xxxx/bdump)errpt -aj DE3B8540

---------------------------------------------------------------------------

LABEL: SC_DISK_ERR7

IDENTIFIER: DE3B8540

Date/Time: Sun Oct 30 18:15:00 BEIST 2016

Sequence Number: 921

Machine Id: 00F7A4904C00

Node Id: sti50l02

Class: H

Type: PERM

WPAR: Global

Resource Name: hdisk46

Resource Class: disk

Resource Type: Hitachi

......

Description

PATH HAS FAILED

Probable Causes

ADAPTER HARDWARE OR CABLE

DASD DEVICE

Failure Causes

UNDETERMINED

Recommended Actions

PERFORM PROBLEM DETERMINATION PROCEDURES

CHECK PATH

Detail Data

PATH ID

不难看出，由于下午出现了相关错误，导致数据库出现了IO异常，oracle自动将文件offline了。然而我在刚刚lspath检查发现都ok，实际上也应该这样，否则recover datafile还会继续报IO错误。夜深了，到这里结束吧！简单记录一下！

PS：

1) NFS挂载的相关参数说明

Operating System	Mount options for Binaries ##	Mount options for Oracle Datafiles	Mount options for CRS Voting Disk and OCR
Sun Solaris *	rw,bg,hard,nointr,rsize=32768, wsize=32768,proto=tcp,noac, vers=3,suid	rw,bg,hard,nointr,rsize=32768, wsize=32768,proto=tcp,noac, forcedirectio, vers=3	rw,bg,hard,nointr,rsize=32768, wsize=32768,proto=tcp,vers=3, noac,forcedirectio
AIX (5L) **	rw,bg,hard,nointr,rsize=32768, wsize=32768,proto=tcp, vers=3,timeo=600	cio,rw,bg,hard,nointr,rsize=32768, wsize=32768,proto=tcp,noac, vers=3,timeo=600	cio,rw,bg,hard,intr,rsize=32768, wsize=32768,tcp,noac, vers=3,timeo=600
HPUX 11.23 *** —	rw,bg,vers=3,proto=tcp,noac, hard,nointr,timeo=600, rsize=32768,wsize=32768,suid	rw,bg,vers=3,proto=tcp,noac, forcedirectio,hard,nointr,timeo=600, rsize=32768,wsize=32768	rw,bg,vers=3,proto=tcp,noac, forcedirectio,hard,nointr,timeo=600 ,rsize=32768,wsize=32768
Windows	Not Supported	Not Supported	Not Supported
Linux x86 # ****	rw,bg,hard,nointr,rsize=32768, wsize=32768,tcp, vers=3, timeo=600, actimeo=0	rw,bg,hard,nointr,rsize=32768, wsize=32768,tcp,actimeo=0, vers=3,timeo=600	rw,bg,hard,nointr,rsize=32768, wsize=32768,tcp,noac,actimeo=0, vers=3,timeo=600
Linux x86-64 # ****	rw,bg,hard,nointr,rsize=32768, wsize=32768,tcp,vers=3, timeo=600, actimeo=0	rw,bg,hard,nointr,rsize=32768, wsize=32768,tcp,actimeo=0, vers=3,timeo=600	rw,bg,hard,nointr,rsize=32768, wsize=32768,tcp,noac,vers=3, timeo=600,actimeo=0
Linux – Itanium	rw,bg,hard,nointr,rsize=32768, wsize=32768,tcp,vers=3, timeo=600, actimeo=0	rw,bg,hard,nointr,rsize=32768, wsize=32768,tcp,actimeo=0, vers=3,timeo=600	rw,bg,hard,nointr,rsize=32768, wsize=32768,tcp,noac,vers=3, timeo=600,actimeo=0

* NFS mount option “forcedirectio” is required on Solaris platforms when mounting the OCR/CRS files when using Oracle 10.1.0.4 or 10.2.0.2 or later (Oracle unpublished bug 4466428)
** AIX is only supported with NAS on AIX 5.3 TL04 and higher with Oracle 10.2.0.1 and later
*** NAS devices are only supported with HPUX 11.23 or higher ONLY
# These mount options are for Linux kernels 2.6 and above. For older kernels please check Note 279393.1

## The stated mount options for binaries are applicable only if the ORACLE HOME is shared.

Due to Unpublished bug 5856342, it is necessary to use the following init.ora parameter when using NAS with all versions of RAC on Linux (x86 & X86-64 platforms) until 10.2.0.4. This bug is fixed and included in 10.2.0.4 patchset.
filesystemio_options = DIRECTIO

2) 如果是Oracle 11.2.0.2版本开始，在没有安装Patch 7691270的情况之下，Oracle在遇到IO错误之后，会自动将数据库crash掉，其中有个相关的隐含参数：_datafile_write_errors_crash_instance

该参数在11.2.0.2版本之后默认为true，包含最新的11.2.0.4版本。

Enter value for par: datafile_write
old   3:  WHERE x.indx = y.indx AND x.ksppinm LIKE '%&amp;par%'
new   3:  WHERE x.indx = y.indx AND x.ksppinm LIKE '%datafile_write%'
NAME                                               VALUE                DESCRIB
-------------------------------------------------- -------------------- ------------------------------------------------------------
_datafile_write_errors_crash_instance              TRUE                 datafile write errors crash instance

Enter value for par: datafile_write

old 3: WHERE x.indx = y.indx AND x.ksppinm LIKE '%&par%'

new 3: WHERE x.indx = y.indx AND x.ksppinm LIKE '%datafile_write%'

NAME VALUE DESCRIB

-------------------------------------------------- -------------------- ------------------------------------------------------------

_datafile_write_errors_crash_instance TRUE datafile write errors crash instance

Tags: ORA-00308, ORA-00372, ORA-00603, ORA-01110, ORA-27037, ORA-27054
Posted in backup & rcovery on 10月 30, 2016

You must be logged in to post a comment.

love wife love life —Roger的Oracle/MySQL/PostgreSQL数据恢复博客

Categories

Archives

最新评论

国内圈子

oracle security

datafile auto offlile due to i/o error

Leave a Reply