HOWTO: Replace a failed spareNode in a VCS clustered PureDisk storagepool



This technote details a procedure to re-add a previously failed sparenode back to a PureDisk 6.6 VCS cluster after PDOS is reinstalled to the sparenode using the same hostname/IP address.
This technote assumes:
   - a cluster node experienced a hardware failure necessitating a reinstallation of PDOS (like root disk failure)
   - the active clustered PureDisk service group failed over successfully from this node
   - this node has become the new sparenode
   - After the hardware issues are resolved on the sparenode, reinstall PDOS on the sparenode using same hostname/IP.
   - Ensure any APM/ASL software and necessary drivers are installed to enable disk array connectivity and NIC functionality.
   - in the command examples below, pd66vcs3 is the spareNode being replaced

1. On SPA, make a copy of the /Storage/etc/topology_nodes.ini file and correct any inconsistencies found inside:
# mkdir /Storage/tmp/topo
# cp /Storage/etc/topology_nodes.ini /Storage/tmp/topo/topology_nodes.ini

2. On the current spareNode, create a tarball of the NBU cluster response files and transfer the /root/clusterfiles.tar files to a temporary place on another server using scp, etc.:
# tar -Pcf /root/clusterfiles.tar /opt/pdcl/bin/cluster/NBU_RSP_*

3. Freeze the groups:
# for i in `/opt/VRTSvcs/bin/hagrp -list | awk '{print $1}' | grep -v "#" | sort -u`; do /opt/VRTSvcs/bin/hagrp -freeze $i; done

4. Power down the spareNode.

5. Install PDOS on new node (use Expert install choice), use same hostname, same IP on the same NIC (in other words, the public node IP should be on eth0 if it was before).
   a. Be sure to configure no IP on the private heartbeat NICs
   b. After PDOS install, run:
# /opt/PDOS/install/
Is this server the PureDisk SPA? (y/n): n
Enter the SPA's Fully Qualified Domain Name:

6. Ensure /etc/hosts is correct on the sparenode.

7. On the each active node:
- ensure /root/.ssh/known_hosts has the sparenode references removed
- ensure ssh -x out to remote node works w/o a password. If not:
- spa: # scp ~/.ssh/
- sparenode: # cat ~/ >> ~/.ssh/authorized_keys
- sparenode: # rm ~/

8. Follow the steps in the PD6602/Revision 1 PureDisk Administrator's Guide to reinstall vcs software page on pages 352 through step 10 (do not proceed to step 11) on page 359.
During these steps, only specify the sparenode's FQDN. Do not specify any of the other nodes (I included some of the responses I provided below):
# cd /media/cdrom/vcsmp3
# ./installer
...type in FQDN of the node name
...specify 'y' when it asks 'Are you sure you want to install a single node cluster?'
follow rest of prompts as noted in the admin guide
.....then upgrade to MP4:
# cd /cdrom/vcsmp4
# ./installmp
...when specifying hostname, only specify sparenode, do not specify the other nodes in the cluster.
follow rest of prompts as noted in the admin guide

9. On sparenode, ensure a directory named /Storage exists:
# mkdir /Storage

10. Run from active spa node:
# /opt/pdinstall/ <sparenode_FQDN>

11. On new sparenode, upgrade kernel to 661 (this script also upgrades Storage Foundation):
# /opt/pdinstall/ --upgrade
...answer yes to reboot.

12. On new sparenode, upgrade kernel to 6612:
# /opt/pdinstall/ --upgrade
...answer yes to reboot.

13. Confirm following in place on sparenode:
    a. RSP files are present:
# ls /opt/pdcl/bin/cluster
-rw-r--r-- 1 root root 173 Jul 21 08:40 AGENT_DEBUG.log
-rw-r--r-- 1 root root 402 Jul 20 20:59 NBU_RSP_pd_grp_cr
-rw-r--r-- 1 root root 404 Jul 20 20:47 NBU_RSP_pd_grp_spa
...if not, copy the NBU_RSP_* files from the SPA node to the sparenode, same location, or use the copy we gathered in step 2.
   # tar -C / -xf /root/cluserfiles.tar /

    b. link present
# ls -l /opt/VRTSvcs/bin
lrwxrwxrwx 1 root root 25 Jul 20 20:58 NetBackupPD -> /opt/pdcl/bin/cluster/vcs
....if not, execute: ln -s /opt/pdcl/bin/cluster/vcs /opt/VRTSvcs/bin/NetBackupPD

    c. 'NetBackupPDAgent' file is present:
# ls -l /opt/pdcl/bin/cluster/vcs
total 116
-r-xr----- 1 root root 13178 Jul 21 08:15 NetBackupPDAgent
....if not, copy the NetBackupPDAgent file from the SPA to the sparenode, same location.
# scp /opt/pdcl/bin/cluster/vcs/NetBackupPDAgent root@pd66vcs3:/opt/pdcl/bin/cluster/vcs/
    d. link present:
# ls -l /opt/pdcl/bin/cluster/vcs
lrwxrwxrwx 1 root root 22 May 26 13:54 perl -> /opt/VRTSperl/bin/perl
....if not, execute: ln -s /opt/VRTSperl/bin/perl /opt/pdcl/bin/cluster/vcs/perl

    e. Ensure /etc/VRTSvcs/conf/config/ on sparenode matches the copy on the SPA. If not, make a copy of the original on the sparenode and then copy the from the SPA to the sparenode, same location.
# cp /etc/VRTSvcs/conf/config/ /etc/VRTSvcs/conf/config/
...on SPA: # scp /etc/VRTSvcs/conf/config/ root@pd66vcs3:/etc/VRTSvcs/conf/config/

    f. Ensure /etc/llthosts, /etc/llttab, and /etc/gabtab contain correct contents (you can scp a copy of the llthosts and gabtab from SPA to spareNode, but the
llttab file has unique MAC addresses inside so use the SPA llttab only as a guide).
# scp /etc/llthosts root@pd66vcs3:/etc/

# scp /etc/gabtab root@pd66vcs3:/etc/

    g. Start llt, then gab, then confirm lltstat is good on all nodes as well as gabconfig -a:
# /etc/init.d/llt start
# /etc/init.d/gab start
# lltstat -l
# gabconfig -a

    h. Ensure sparenode can see the diskgroups:
# vxdisk -o alldgs list

14. Start HA on the sparenode:
# /opt/VRTSvcs/bin/hastart

15. Check status via:
# /opt/VRTSvcs/bin/hastatus -sum

16. Unfreeze cluster groups:
# for i in `/opt/VRTSvcs/bin/hagrp -list | awk '{print $1}' | grep -v "#" | sort -u`; do /opt/VRTSvcs/bin/hagrp -unfreeze $i; done

If all is well, test failover.
# /opt/VRTSvcs/bin/hagrp -switch pd_group1 -to pd66vcs3

Terms of use for this information are found in Legal Notices.



Did this article answer your question or resolve your issue?


Did this article save you the trouble of contacting technical support?


How can we make this article more helpful?

Email Address (Optional)