极其痛苦的一次addNode for linux rac
本站文章除注明转载外,均为本站原创: 转载自love wife love life —Roger的Oracle/MySQL/PostgreSQL数据恢复博客
本文链接地址: 极其痛苦的一次addNode for linux rac
1 2 3 4 5 |
今天给一个客户做rac添加节点的操作,可谓是一波三折啊,折煞死人。需要说明一下的是, 客户这套环境以前是双节点rac,其中一个坏了以后,将其信息删除变成单节点RAC了,今天 要做的就是把那个坏的节点添加进来(客户已经把以前损坏的主机重装了系统). 如下是最开始未到客户之前, 我自己写的一个大概的实施步骤: |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 |
1. 参照Cluster中原有节点,修改新增Node 主机的如下文件 /etc/sysctl.conf /etc/security/limits.conf /etc/fstab 2. 创建用户组以及用户、以及相关目录 groupadd -g 501 dba groupadd -g 500 oinstall useradd -u 500 -g oinstall -G dba oracle passwd oracle chown -R oracle:dba /opt/oracle 3. 参考Cluster中原有节点,配置oracle环境变量 -- 直接复制并修改.bash_profile 4. 配置 /etc/hosts 当前cluster中所有节点均需要添加新增node的信息 5. 配置SSH等效性 $ mkdir ~/.ssh $ chmod 700 ~/.ssh $ /usr/bin/ssh-keygen -t rsa $ /usr/bin/ssh-keygen -t dsa $ touch ~/.ssh/authorized_keys $ cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys $ cat ~/.ssh/id_dsa.pub >> ~/.ssh/authorized_keys $ cp ~/.ssh/authorized_keys ~/.ssh/callrac1.authorized_keys $ cat ~/.ssh/callrac1.authorized_keys >> ssh oracle@callrac1 ~/.ssh/authorized_keys $ chmod 600 ~/.ssh/authorized_keys 6. 验证等效性 7. 检查相关OS包是否安装 8. 验证Node是否满足安装条件 cluvfy stage -pre crsinst -n callrac1,callrac2 9. 执行 addNode.sh $ORA_CRS_HOME/oui/bin/addNode.sh 10. 在新增Node执行vipca su -root cd $ORA_CRS_HOME/bin ./vipca 11. 更新新增节点的 /etc/inittab 12. 验证新增Node是否已经加入到cluster中 ./olsnodes 13. 安装Oracle database软件 cd $ORACLE_HOME/oui/bin ./addNode.sh 14. 配置netca,创建listener, 配置tnsnames ./netca 15. 添加实例 在callrac2上执行 dbca 为新增节点添加数据库实例 16. 检查cluster是否正常 ./crs_stat -t即可 |
1 |
这里需要说明几点: |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 |
一、由于callrac1 ssh似乎有些问题,所以我是将其删除重配了,类似普通rac安装的ssh配置方式; 二、由于客户使用了裸设备,我们知道linux没用第三方软件的话是无法直接使用的,只能通过 raw 绑定的方式来使用裸设备,做了如下操作: -- 修改/etc/sysconfig/rawdevices 并使用raw命令绑定,如下: #OCR raw /dev/raw/raw1 /dev/emcpoweri1 raw /dev/raw/raw2 /dev/emcpoweri2 #votedisk raw /dev/raw/raw3 /dev/emcpoweri3 raw /dev/raw/raw4 /dev/emcpoweri5 raw /dev/raw/raw5 /dev/emcpoweri6 #ASM raw /dev/raw/raw6 /dev/emcpowerh1 raw /dev/raw/raw7 /dev/emcpowerg1 raw /dev/raw/raw8 /dev/emcpowerk1 -- 修改属主、权限 chown oracle:oinstall /dev/raw/raw* chown chmod 660 /dev/raw/raw* -- 将绑定信息加入到/etc/rc.d/rc.local中。 |
1 2 3 4 |
下面来说说node添加过程中的波折: 1. 第一次执行 addNode.sh 添加安装clusterware的时候就报错,cluster中仍然存在以前节点的信息; 客户说是以前删除的时候有些正规步骤没有执行,于是我采取的下面的步骤: |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 |
-- 删除节点信息 [root@callrac1 install]# ./rootdeletenode.sh callrac2 2 CRS nodeapps are deleted successfully clscfg: EXISTING configuration version 3 detected. clscfg: version 3 is 10G Release 2. Node deletion operation successful. 'callrac2' deleted successfully -- 更新cluster node信息 [oracle@callrac1 bin]$ ./runInstaller -updateNodelist ORACLE_HOME=/home/oracle/product/10.2.0/db_1 "CLUSTER_NODES=callrac1" Starting Oracle Universal Installer... No pre-requisite checks found in oraparam.ini, no system pre-requisite checks will be executed. The inventory pointer is located at /etc/oraInst.loc The inventory is located at /home/oracle/oraInventory 'UpdateNodeList' was successful. [oracle@callrac1 bin]$ ./runInstaller -updateNodelist ORACLE_HOME=/home/oracle/product/10.2.0/crs "CLUSTER_NODES=callrac1" Starting Oracle Universal Installer... No pre-requisite checks found in oraparam.ini, no system pre-requisite checks will be executed. The inventory pointer is located at /etc/oraInst.loc The inventory is located at /home/oracle/oraInventory 'UpdateNodeList' was successful. [oracle@callrac1 bin]$ $ORA_CRS_HOME/bin/olsnodes -n callrac1 1 callrac2 2 #### 仍然还有 #### -- 再次执行deletenode.sh [root@callrac1 callrac1]# cd /home/oracle/product/10.2.0/crs/install [root@callrac1 install]# ./rootdeletenode.sh callrac2,2 CRS-0210: Could not find resource 'ora.callrac2.ons'. CRS-0210: Could not find resource 'ora.callrac2.vip'. CRS-0210: Could not find resource 'ora.callrac2.gsd'. CRS-0210: Could not find resource ora.callrac2.vip. CRS nodeapps are deleted successfully clscfg: EXISTING configuration version 3 detected. clscfg: version 3 is 10G Release 2. Successfully deleted 14 values from OCR. Key SYSTEM.css.interfaces.nodecallrac2 marked for deletion is not there. Ignoring. Successfully deleted 5 keys from OCR. Node deletion operation successful. 'callrac2,2' deleted successfully -- 检查是否只有一个节点 [oracle@callrac1 install]$ $ORA_CRS_HOME/bin/olsnodes -n callrac1 1 #### 无旧节点信息了,正常。#### |
1 |
2. 在安装复制新的节点2(callrac2)信息时,执行最后的root.sh脚本出现报错信息,如下: |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 |
[root@callrac2 raw]# /home/oracle/product/10.2.0/crs/root.sh WARNING: directory '/home/oracle/product/10.2.0' is not owned by root WARNING: directory '/home/oracle/product' is not owned by root WARNING: directory '/home/oracle' is not owned by root Checking to see if Oracle CRS stack is already configured /etc/oracle does not exist. Creating it now. OCR LOCATIONS = /dev/raw/raw1,/dev/raw/raw2 OCR backup directory '/home/oracle/product/10.2.0/crs/cdata/crs' does not exist. Creating now Setting the permissions on OCR backup directory Setting up NS directories Oracle Cluster Registry configuration upgraded successfully WARNING: directory '/home/oracle/product/10.2.0' is not owned by root WARNING: directory '/home/oracle/product' is not owned by root WARNING: directory '/home/oracle' is not owned by root clscfg: EXISTING configuration version 3 detected. clscfg: version 3 is 10G Release 2. assigning default private interconnect name callrac1 for node 0. assigning default hostname callrac1 for node 0. Successfully accumulated necessary OCR keys. Using ports: CSS=49895 CRS=49896 EVMC=49898 and EVMR=49897. node <nodenumber>: <nodename> <private interconnect name> <hostname> node 0: callrac1 callrac1 callrac1 clscfg: Arguments check out successfully. NO KEYS WERE WRITTEN. Supply -force parameter to override. -force is destructive and will destroy any previous cluster configuration. Oracle Cluster Registry for cluster has already been initialized Startup will be queued to init within 30 seconds. Adding daemons to inittab Expecting the CRS daemons to be up within 600 seconds. CSS is active on these nodes. callrac1 callrac2 CSS is active on all nodes. Waiting for the Oracle CRSD and EVMD to start Oracle CRS stack installed and running under init(1M) Running vipca(silent) for configuring nodeapps Creating VIP application resource on (0) nodes. Creating GSD application resource on (1) nodes.. Creating ONS application resource on (1) nodes.. Starting VIP application resource on (2) nodes.1:CRS-0210: Could not find resource ora.callrac2.vip. Check the log file "/home/oracle/product/10.2.0/crs/log/callrac2/racg/ora.callrac2.vip.log" for more details .. Starting GSD application resource on (2) nodes... Starting ONS application resource on (2) nodes... Done. 很明显,上面信息显示新的节点2的vip无法启动,检查发生日志ora.callrac2.vip.log没有产生,怪异。 尝试手工运行vipca,过程中也报错,错误如下: CRS-0210: Could not find resource 'ora.callrac2.vip'. 由于hosts设置本身并没有问题,该vip地址ping 也无通,是正常的,那为何找不到资源呢?经客户提醒说, 直接在该主机上对公网网卡进行绑定,即设置2个ip,不过设置以后,我发现错误以后,ifconfig -a都已经能显示 该vip 已经绑定在eth0(public 网卡)上了,于是我尝试用srvctl手工执行如下命令: [root@callrac1 bin]# ./srvctl add nodeapps -n callrac2 -A 10.100.1.97/255.255.255.0/eth0 -o /home/oracle/product/10.2.0/crs CRS-0210: Could not find resource 'ora.callrac2.vip'. 错误依旧,该问题困扰了我超过3个小时,此时的crs资源信息如下: [oracle@callrac1 racg]$ crs_stat -t Name Type Target State Host ------------------------------------------------------------ ora....b1.inst application ONLINE ONLINE callrac1 ora.calldb.db application ONLINE ONLINE callrac1 ora....SM1.asm application ONLINE ONLINE callrac1 ora....C1.lsnr application ONLINE ONLINE callrac1 ora....ac1.gsd application ONLINE ONLINE callrac1 ora....ac1.ons application ONLINE ONLINE callrac1 ora....ac1.vip application ONLINE ONLINE callrac1 ora....ac2.gsd application ONLINE ONLINE callrac2 ora....ac2.ons application ONLINE ONLINE callrac2 #### 缺少vip #### |
1 2 |
3. 下午2点吃饭以后,我想到由于1节点的vip是正常的,那么可以把其cluster resource 信息导出, 然后编辑为节点2的vip信息,然后注册试试,经过尝试发现ok,如下: |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 |
a. 信息导出 $ crs_stat -p ora.callrac1.vip > /tmp/vip.cap b. 将其中callrac1关键字替换为callrac2 $ vi vip.cap c. 将vip.cap内容复制到节点2,(注意: 名字不要出错) /oracle/products/10.2/crs/crs/public/ora.callrac2.vip.cap d. 创建及注册VIP信息 $ ./crs_profile -create ora.callrac2.vip -I /oracle/products/10.2/crs/crs/public/ora.callrac2.vip.cap -q $ ./crs_register ora.callrac2.vip 其中也出现了一些小的波折,有些信息未保留,故不作说明。 例如:其中css出问题,导致crsctl stop crs无法停止,只能reboot主机,ons异常等等。 |
1 |
4. 接下来运行addNode.sh. 安装oracle database software的时候,运行addNode.sh就报错: |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 |
[oracle@callrac1 bin]$ ./addNode.sh Starting Oracle Universal Installer... No pre-requisite checks found in oraparam.ini, no system pre-requisite checks will be executed. Oracle Universal Installer, Version 10.2.0.5.0 Production Copyright (C) 1999, 2010, Oracle. All rights reserved. Exception java.lang.NullPointerException occurred.. java.lang.NullPointerException at oracle.sysman.oii.oiic.OiicAddNodeSession.initialize(OiicAddNodeSession.java:568) at oracle.sysman.oii.oiic.OiicAddNodeSession.<init>(OiicAddNodeSession.java:135) at oracle.sysman.oii.oiic.OiicSessionWrapper.createNewSession(OiicSessionWrapper.java:860) at oracle.sysman.oii.oiic.OiicSessionWrapper.<init>(OiicSessionWrapper.java:186) at oracle.sysman.oii.oiic.OiicInstaller.init(OiicInstaller.java:508) at oracle.sysman.oii.oiic.OiicInstaller.runInstaller(OiicInstaller.java:961) at oracle.sysman.oii.oiic.OiicInstaller.main(OiicInstaller.java:899) google了一下,有个哥们说是/etc/oraInst.loc权限不对,我将其设置为777,发现不管用, 最后通过查询metalink 发现一篇文档: Add Node Fails with Exception java.lang.NullPointerException [ID 1073878.1] 该文档意思是说: 该节点的inventory.xml信息跟实际不一致,需要通过下面方式来修复: 不过, 我只执行该脚本就报错: [oracle@callrac1 bin]$ ./runInstaller -silent -attachHome ORACLE_HOME=/home/oracle/product/10.2.0/db_1 ORACLE_HOME_NAME=OraDb10g_home1 CLUSTER_NODES=callrac1,callrac2 Starting Oracle Universal Installer... No pre-requisite checks found in oraparam.ini, no system pre-requisite checks will be executed. ''AttachHome'' operation failed as it was called without name of the Oracle Home being attached. 提示不能attached,怪异了,印象中install下面还有脚本可以使用,于是继续执行下面的脚本: [oracle@callrac1 bin]$ ./attachHome.sh Starting Oracle Universal Installer... No pre-requisite checks found in oraparam.ini, no system pre-requisite checks will be executed. The inventory pointer is located at /etc/oraInst.loc The inventory is located at /home/oracle/oraInventory 'AttachHome' was successful. 随后再次执行就ok了。 |
1 2 |
5. 把上面的问题解决了,再次执行addNode.sh,错误又来了,报错 OUI-10010, 处理该错误较为简单,执行下面脚本即可: |
1 2 3 4 5 6 7 8 |
[oracle@callrac1 bin]$ ./runInstaller -updateNodeList ORACLE_HOME=/home/oracle/product/10.2.0/db_1 "CLUSTER_NODES={callrac1}" -local Starting Oracle Universal Installer... No pre-requisite checks found in oraparam.ini, no system pre-requisite checks will be executed. The inventory pointer is located at /etc/oraInst.loc The inventory is located at /home/oracle/oraInventory 'UpdateNodeList' was successful. |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 |
6. 再次执行addNode.sh,快结束的时候,抛错误,意思是复制某个文件出错, ls -ltr 查看了一下, 该文件超过8G,节点1空间倒是很足够,但是添加的节点2的空间不够,整个/home/oracle只有10g左右, 有预感,这次肯定失败了,果然,最后几个waring的告警,我直接ignore,问题来了,最后执行root.sh, 发现$ORACLE_HOME下面根本没有root.sh,我猜可能是文件未copy完,因为文件系统的空间不够用了。 两边 ls -ltr | wc -l 对比了一下,果然少了40来个。 最后将该大文件直接rm删除,然后再次运行addNode.sh,将原来的文件覆盖即可,最后root.sh执行ok。 其中还有一些小波折,不多说了。 7. cluster和database software都ok了,剩下的就是添加实例和listener,配置tnsnames的工作了。 经过检查发现,所有需要的文件都已经copy过来了,那就简单了。 编辑asm pfile,数据库实例pfile,listener.ora不用编辑,两边都一样。 然后启动+ASM2实例和数据库实例calldb2以及监听。 8. 最后的工作肯定是要将这些资源添加到crs中,发现直接srvctl add是不行的,这里我仍然通过 前面的方法crs_stat -p导出,然后编辑注册的方式搞定。 以为都ok的时候,crs_stat -t 发现+ASM2实例的状态是unknown,花了一天的时间,我肯定要让事情 变的完美。于是停掉该asm实例,打算重启下实例以及数据实例,问题来了。 asm实例重启了,数据库实例已经mount,我还没来得及open的时候,不知道为啥,突然网络断了。 汗,极度郁闷,无法ping通,于是在节点1上查看,节点2的crs都down了,如下: |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 |
[oracle@callrac1 ~]$ crs_stat -t Name Type Target State Host ------------------------------------------------------------ ora....b1.inst application ONLINE ONLINE callrac1 ora....b2.inst application ONLINE OFFLINE ora.calldb.db application ONLINE ONLINE callrac1 ora....SM1.asm application ONLINE ONLINE callrac1 ora....C1.lsnr application ONLINE ONLINE callrac1 ora....ac1.gsd application ONLINE ONLINE callrac1 ora....ac1.ons application ONLINE ONLINE callrac1 ora....ac1.vip application ONLINE ONLINE callrac1 ora....SM2.asm application ONLINE OFFLINE ora....C2.lsnr application ONLINE OFFLINE ora....ac2.gsd application ONLINE OFFLINE ora....ac2.ons application ONLINE OFFLINE ora....ac2.vip application ONLINE ONLINE callrac1 |
1 2 |
虽然最后有一点小小的缺陷,不过从这个来看,已经完全好了,主机重启就ok,8点已过,所以我就闪了。 因为哥哥还没吃晚饭,谢谢! |
7 Responses to “极其痛苦的一次addNode for linux rac”
谢谢哥们分享,做完一定很有成就感哈
Good!
补充下 环境是linux 4.7 x64 oracle version为10.2.0.5.2
谢谢roger的分享.身体,身体
Love your blog. It just made my penis bigger.
Is there any information about this subject in other languages?
roger真牛B
Leave a Reply
You must be logged in to post a comment.