Storage Foundation and High Availability Solutions 7.4.1 Solutions Guide - Windows
- Section I. Introduction
- Introducing Storage Foundation and High Availability Solutions
- Using the Solutions Configuration Center
- About the Solutions Configuration Center
- Starting the Solutions Configuration Center
- Options in the Solutions Configuration Center
- About launching wizards from the Solutions Configuration Center
- Remote and local access to Solutions wizards
- Solutions wizards and logs
- Workflows in the Solutions Configuration Center
- SFW best practices for storage
- Section II. Quick Recovery
- Section III. High Availability
- High availability: Overview
- About high availability
- About clusters
- How VCS monitors storage components
- Shared storage - if you use NetApp filers
- Shared storage - if you use SFW to manage cluster dynamic disk groups
- Shared storage - if you use Windows LDM to manage shared disks
- Non-shared storage - if you use SFW to manage dynamic disk groups
- Non-shared storage - if you use Windows LDM to manage local disks
- Non-shared storage - if you use VMware storage
- Deploying InfoScale Enterprise for high availability: New installation
- About the high availability solution
- Tasks for a new high availability (HA) installation - additional applications
- Reviewing the InfoScale installation requirements
- Notes and recommendations for cluster and application configuration
- Reviewing the configuration
- Configuring the storage hardware and network
- About installing the Veritas InfoScale products
- Configuring disk groups and volumes
- Configuring the cluster using the Cluster Configuration Wizard
- About modifying the cluster configuration
- About installing and configuring the application or server role
- Configuring the service group
- About configuring file shares
- About configuring IIS sites
- About configuring applications using the Application Configuration Wizard
- About configuring the Oracle service group using the wizard
- Enabling fast failover for disk groups (optional)
- Configuring the service group in a non-shared storage environment
- Verifying the cluster configuration
- Possible tasks after completing the configuration
- Adding nodes to a cluster
- Modifying the application service groups
- Adding DMP to a clustering configuration
- High availability: Overview
- Section IV. Campus Clustering
- Introduction to campus clustering
- Deploying InfoScale Enterprise for campus cluster
- About the Campus Cluster solution
- Notes and recommendations for cluster and application configuration
- Reviewing the configuration
- Installing and configuring the hardware
- Configuring the storage hardware and network
- About installing the Veritas InfoScale products
- Configuring the cluster using the Cluster Configuration Wizard
- Creating disk groups and volumes
- About cluster disk groups and volumes
- Example disk group and volume configuration in campus cluster
- Considerations when creating disks and volumes for campus clusters
- Viewing the available disk storage
- Creating a dynamic disk group
- Adding disks to campus cluster sites
- Creating volumes for campus clusters
- Installing the application on cluster nodes
- Configuring service groups
- Verifying the cluster configuration
- Section V. Replicated Data Clusters
- Introduction to Replicated Data Clusters
- Deploying Replicated Data Clusters: New application installation
- Tasks for a new replicated data cluster installation - additional applications
- Notes and recommendations for cluster and application configuration
- Sample configuration
- Configuring the storage hardware and network
- About installing the Veritas InfoScale products
- Setting up security for Volume Replicator
- Configuring the cluster using the Cluster Configuration Wizard
- Configuring disk groups and volumes
- Installing and configuring the application or server role
- Configuring the service group
- About configuring file shares
- About configuring IIS sites
- About configuring applications using the Application Configuration Wizard
- Creating the primary system zone for the application service group
- Verifying the cluster configuration
- Creating a parallel environment in the secondary zone
- Adding nodes to a cluster
- Creating the Replicated Data Sets with the wizard
- Configuring a RVG service group for replication
- Creating the RVG service group
- Configuring the resources in the RVG service group for RDC replication
- Configuring the IP and NIC resources
- Configuring the VMDg or VMNSDg resources for the disk groups
- Adding the Volume Replicator RVG resources for the disk groups
- Linking the Volume Replicator RVG resources to establish dependencies
- Deleting the VMDg or VMNSDg resource from the application service group
- Configuring the RVG Primary resources
- Configuring the primary system zone for the RVG service group
- Setting a dependency between the service groups
- Adding the nodes from the secondary zone to the RDC
- Adding the nodes from the secondary zone to the RVG service group
- Configuring secondary zone nodes in the RVG service group
- Configuring the RVG service group NIC resource for fail over (VMNSDg only)
- Configuring the RVG service group IP resource for failover
- Configuring the RVG service group VMNSDg resources for fail over
- Adding the nodes from the secondary zone to the application service group
- Configuring the zones in the application service group
- Configuring the application service group IP resource for fail over (VMNSDg only)
- Configuring the application service group NIC resource for fail over (VMNSDg only)
- Verifying the RDC configuration
- Additional instructions for GCO disaster recovery
- Section VI. Disaster Recovery
- Disaster recovery: Overview
- Deploying disaster recovery: New application installation
- Tasks for a new disaster recovery installation - additional applications
- Tasks for setting up DR in a non-shared storage environment
- Notes and recommendations for cluster and application configuration
- Reviewing the configuration
- Configuring the storage hardware and network
- About managing disk groups and volumes
- Setting up the secondary site: Configuring SFW HA and setting up a cluster
- Verifying that your application or server role is configured for HA at the primary site
- Setting up your replication environment
- Assigning user privileges (secure clusters only)
- About configuring disaster recovery with the DR wizard
- Cloning the storage on the secondary site using the DR wizard (Volume Replicator replication option)
- Creating temporary storage on the secondary site using the DR wizard (array-based replication)
- Installing and configuring the application or server role (secondary site)
- Cloning the service group configuration from the primary site to the secondary site
- Configuring the application service group in a non-shared storage environment
- Configuring replication and global clustering
- Creating the replicated data sets (RDS) for Volume Replicator replication
- Creating the Volume Replicator RVG service group for replication
- Configuring the global cluster option for wide-area failover
- Verifying the disaster recovery configuration
- Establishing secure communication within the global cluster (optional)
- Adding multiple DR sites (optional)
- Possible task after creating the DR environment: Adding a new failover node to a Volume Replicator environment
- Maintaining: Normal operations and recovery procedures (Volume Replicator environment)
- Recovery procedures for service group dependencies
- Testing fault readiness by running a fire drill
- About disaster recovery fire drills
- About the Fire Drill Wizard
- About post-fire drill scripts
- Tasks for configuring and running fire drills
- Prerequisites for a fire drill
- Preparing the fire drill configuration
- System Selection panel details
- Service Group Selection panel details
- Secondary System Selection panel details
- Fire Drill Service Group Settings panel details
- Disk Selection panel details
- Hitachi TrueCopy Path Information panel details
- HTCSnap Resource Configuration panel details
- SRDFSnap Resource Configuration panel details
- Fire Drill Preparation panel details
- Running a fire drill
- Re-creating a fire drill configuration that has changed
- Restoring the fire drill system to a prepared state
- Deleting the fire drill configuration
- Considerations for switching over fire drill service groups
- Section VII. Microsoft Clustering Solutions
- Microsoft clustering solutions overview
- Deploying SFW with Microsoft failover clustering
- Tasks for deploying InfoScale Storage with Microsoft failover clustering
- Reviewing the configuration
- Configuring the storage hardware and network
- Establishing a Microsoft failover cluster
- Tasks for installing InfoScale Foundation or InfoScale Storage for Microsoft failover clustering
- Creating SFW disk groups and volumes
- Creating a group for the application in the failover cluster
- Installing the application on cluster nodes
- Completing the setup of the application group in the failover cluster
- Implementing a dynamic quorum resource
- Verifying the cluster configuration
- Configuring InfoScale Storage in an existing Microsoft Failover Cluster
- Deploying SFW with Microsoft failover clustering in a campus cluster
- Tasks for deploying InfoScale Storage with Microsoft failover clustering in a campus cluster
- Reviewing the configuration
- Configuring the storage hardware and network
- Establishing a Microsoft failover cluster
- Tasks for installing InfoScale Foundation or InfoScale Storage for Microsoft failover clustering
- Creating disk groups and volumes
- Implementing a dynamic quorum resource
- Setting up a group for the application in the failover cluster
- Installing the application on the cluster nodes
- Completing the setup of the application group in the cluster
- Verifying the cluster configuration
- Deploying SFW and VVR with Microsoft failover clustering
- Tasks for deploying InfoScale Storage and Volume Replicator with Microsoft failover clustering
- Part 1: Setting up the cluster on the primary site
- Reviewing the prerequisites and the configuration
- Installing and configuring the hardware
- Installing Windows and configuring network settings
- Establishing a Microsoft failover cluster
- Installing InfoScale Storage (primary site)
- Setting up security for Volume Replicator
- Creating SFW disk groups and volumes
- Completing the primary site configuration
- Part 2: Setting up the cluster on the secondary site
- Part 3: Adding the Volume Replicator components for replication
- Part 4: Maintaining normal operations and recovery procedures
- Section VIII. Server Consolidation
- Server consolidation overview
- Server consolidation configurations
- Typical server consolidation configuration
- Server consolidation configuration 1 - many to one
- Server consolidation configuration 2 - many to two: Adding clustering and DMP
- About this configuration
- Adding the new hardware
- Establishing the Microsoft failover cluster
- Adding SFW support to the cluster
- Setting up Microsoft failover cluster groups for the applications
- Installing applications on the second computer
- Completing the setup of the application group in the Microsoft cluster
- Changing the quorum resource to the dynamic quorum resource
- Verifying the cluster configuration
- Enabling DMP
- SFW features that support server consolidation
- Appendix A. Using Veritas AppProtect for vSphere
- About Just In Time Availability
- Prerequisites
- Setting up a plan
- Deleting a plan
- Managing a plan
- Viewing the history tab
- Limitations of Just In Time Availability
- Getting started with Just In Time Availability
- Supported operating systems and configurations
- Viewing the properties
- Log files
- Plan states
- Troubleshooting Just In Time Availability
Campus cluster failure with Microsoft clustering scenarios
This section focuses on the failure and recovery scenarios with a campus cluster with Microsoft clustering and SFW installed.
For information about the quorum resource and arbitration in Microsoft clustering:
See Microsoft clustering quorum and quorum arbitration.
The following table lists failure situations and the outcomes that occur.
Table: List of failure situations and possible outcomes
Failure situation | Outcome | Comments |
|---|---|---|
Application fault May mean the services stopped for an application, a NIC failed, or a database table went offline | Failover | If the services stop for an application failure, the application automatically fails over to the other site. |
Server failure (Site A) May mean that a power cord was unplugged, a system hang occurred, or another failure caused the system to stop responding | Failover | Assuming a two-node cluster pair, failing a single node results in a cluster failover. Service is temporarily interrupted for cluster resources that are moved from the failed node to the remaining live node. |
Server failure (Site B) May mean that a power cord was unplugged, a system hang occurred, or another failure caused the system to stop responding | No interruption of service | Failure of the passive site (Site B) does not interrupt service to the active site (Site A). |
Partial SAN network failure May mean that SAN fiber channel cables were disconnected to Site A or Site B Storage | No interruption of service | Assuming that each of the cluster nodes has some type of Dynamic Multi-Pathing (DMP) solution, removing one SAN fiber cable from a single cluster node should not effect any cluster resources running on that node, because the underlying DMP solution should seamlessly handle the SAN fiber path failover. |
Private IP Heartbeat Network Failure May mean that the private NICs or the connecting network cables failed | No interruption of service | With the standard two-NIC configuration for a cluster node, one NIC for the public cluster network and one NIC for the private heartbeat network, disabling the NIC for the private heartbeat network should not effect the cluster software and the cluster resources, because the cluster software simply routes the heartbeat packets through the public network. |
Public IP Network Failure May mean that the public NIC or LAN network has failed |
| When the public NIC on the active node, or public LAN fails, clients cannot access the active node, and failover occurs. |
Public and Private IP or Network Failure May mean that the LAN network, including both private and public NIC connections, has failed |
| The site that owned the quorum resource right before the "network partition" remains the owner of the quorum resource, and is the only surviving cluster node. The cluster software running on the other cluster node self-terminates because it has lost the cluster arbitration for the quorum resource. |
Lose Network Connection (SAN & LAN), failing both heartbeat and connection to storage May mean that all network and SAN connections are severed; for example, if a single pipe is used between buildings for the Ethernet and storage |
| The node/site that owned the quorum resource right before the "network partition" remains the owner of the quorum resource, and is the only surviving cluster node. The cluster software running on the other cluster node self-terminates because it has lost the cluster arbitration for the quorum resource. By default, the Microsoft clustering clussvc service tries to auto-start every minute, so after LAN/SAN communication has been re-established, the Microsoft clustering clussvc auto-starts and will be able to re-join the existing cluster. |
Storage Array failure on Site A, or on Site B May mean that a power cord was unplugged, or a storage array failure caused the array to stop responding |
| The campus cluster is divided equally between two sites with one array at each site. Completely failing one storage array should have no effect on the cluster or any cluster resources that are online. However, you cannot move any cluster resources between nodes after this storage failure, because neither node will be able to obtain a majority of disks within the cluster disk group. |
Site A failure (power) Means that all access to site A, including server and storage, is lost | Manual failover | If the failed site contains the cluster node that owned the quorum resource, then the overall cluster is offline and cannot be onlined on the remaining live site without manual intervention. |
Site B failure (power) Means that all access to site B, including server and storage, is lost |
| If the failed site did not contain the cluster node that owned the quorum resource, then the cluster is still alive with whatever cluster resources that were online on that node right before the site failure. |