How to rebuild a cluster node that is in a Storage Foundation HA (tm) for Windows cluster
SECTION 1 - Backing up the Cluster Configuration
This section walks the user through backing up important information about the node and cluster. Some of these files will be required to complete the rebuilding process. Back up these files and registry keys to a location that will not be affected by a complete uninstall and reinstall of Storage Foundation HA (SFW-HA) for Windows.
1. Export the following registry key from the node that will be rebuilt:
2. If Exchange is involved, export the following key as well (if it exists**):
**Note: this key is no longer present in SFW HA 6.x for Exchange 2010 installations.
3. On the node that is going to be rebuilt, back up the following files:
NOTE: By default, %VCS_ROOT% is the equivalent of C:\Program Files\VERITAS
4.Back up the "lic" folder under following location:
C:\Program Files\Common Files\VERITAS Shared\vrtslic\lic
In 64bit operating systems the "lic" folder can be found here:
C:\Program Files (x86)\Common Files\Veritas Shared\vrtslic\lic
SECTION 2 - Removing the Node
1. Move any service groups to another node.
2. Remove the node that is going to be rebuilt from the System List of each service group. This can be done by performing the following steps:
- Open "Cluster Explorer" (the main VCS Java GUI) and highlight one of the service groups
- From the top toolbar, select Tools > System Manager
- Use the arrows to remove the node that is going to be rebuilt from the System List.
Note: Repeat the above steps for each service group as needed.
3. From a Windows command prompt, run the following commands on the node that is to be rebuilt:
haconf -dump -makero
net stop llt /y
4. Disable the following Windows services on the node that is to be rebuilt:
VERITAS Cluster Server Helper
VERITAS Command Server
VERITAS High Availability Engine
VERITAS VCSComm Startup
5. Delete the following files on the node that is to be rebuilt:
6. Delete the following registry keys on the node that is to be rebuilt:
7. If Veritas DMP (Dynamic MultiPathing) is being used, disable all but one path on the node that is to be rebuilt.
8. From Add/Remove Programs, uninstall any Veritas maintenance or service packs (such as MP1 or RP2) from the node that is to be rebuilt.
9. Uninstall "Storage Foundation HA for Windows (Server Components)." During the uninstall wizard, also select the removal of the "Client Components."
10. Reboot the node.
SECTION 3 - Rebuilding the Node
1. Reinstall SFW-HA as normal on the node that is to be rebuilt. If needed, use the license keys that were previously backed up from the C:\Program Files\Common Files\VERITAS Shared\vrtslic\lic folder in "Section 1."
2. Reboot the node.
3. Install any maintenance or service packs that had previously been installed.
4. Reboot the node.
5. Install any other hotfixes or patches that were previously installed. A reboot may or may not be necessary depending on the instructions for the individual patches.
Note: The version and patch level of SFW-HA should match the other nodes in the cluster before continuing.
6. If the node had previously been brought down to a single path, the other paths may be enabled again.
WARNING: Before adding new paths, multipathing software should be enabled and the relevant hardware should be placed under the control of the multipathing software. Adding additional paths without configuring multipathing software can cause data corruption. If Veritas Dynamic Multipathing (DMP) is being used, this step can be performed by right-clicking on any existing path within VEA and removing the check from the "Exclude" option from the Array Settings dialog box.
7. Run the "VERITAS Cluster Configuration Wizard" on the node that is to be rebuilt. Do not join this node to the existing cluster yet. Create a new cluster with this server as the only node.
Note: Select Yes when prompted to configure private link heart-beats on single node cluster. This configures LLT, GAB, and VCSComm for auto-start.
8. From another node that is still in the original cluster (not the rebuilt node) run the following commands:
haconf -dump -makero
hastop -all -force
9. On the rebuilt node, run the following command:
net stop llt /y
10. If this is an Exchange cluster, restore the "ExchConfig" registry key to the rebuilt node. This key was previously backed up in "Section 1." The location of this key should be:
11. Replace the "main.cf" file on ALL nodes with the one that was backed up in "Section 1." The location of this file is:
12. On the rebuilt node only, replace the following files that had previously backed up.
13. On the rebuilt node only, modify the following registry keys to match the other nodes in the cluster:
14. On the rebuilt node only, modify the NodeID registry key to match the value contained llthosts.txt:
15. On the rebuilt node only, delete the following key if it exists:
16. On any node, run the following command:
17. Run the following command to check the status of the nodes:
Note: If the cluster engine (had) fails to start, review engine_a.txt to determine the reason for the failure. In most cases, a had failure that occurs directly after rebuiling a node is the result of a minor mismatch between one of the registry keys or comms files that were backed up and restored. If errors are discovered, run net stop llt /y, followed by hastart to refresh the setting after making any changes. By default, engine_a.txt is located under the following path: