Cluster Server 7.4.1 Administrator's Guide - Linux
- Section I. Clustering concepts and terminology
- Introducing Cluster Server
- About Cluster Server
- About cluster control guidelines
- About the physical components of VCS
- Logical components of VCS
- About resources and resource dependencies
- Categories of resources
- About resource types
- About service groups
- Types of service groups
- About the ClusterService group
- About the cluster UUID
- About agents in VCS
- About agent functions
- About resource monitoring
- Agent classifications
- VCS agent framework
- About cluster control, communications, and membership
- About security services
- Components for administering VCS
- Putting the pieces together
- About cluster topologies
- VCS configuration concepts
- Introducing Cluster Server
- Section II. Administration - Putting VCS to work
- About the VCS user privilege model
- Administering the cluster from the command line
- About administering VCS from the command line
- About installing a VCS license
- Administering LLT
- Administering the AMF kernel driver
- Starting VCS
- Stopping VCS
- Stopping the VCS engine and related processes
- Logging on to VCS
- About managing VCS configuration files
- About managing VCS users from the command line
- About querying VCS
- About administering service groups
- Adding and deleting service groups
- Modifying service group attributes
- Bringing service groups online
- Taking service groups offline
- Switching service groups
- Migrating service groups
- Freezing and unfreezing service groups
- Enabling and disabling service groups
- Enabling and disabling priority based failover for a service group
- Clearing faulted resources in a service group
- Flushing service groups
- Linking and unlinking service groups
- Administering agents
- About administering resources
- About adding resources
- Adding resources
- Deleting resources
- Adding, deleting, and modifying resource attributes
- Defining attributes as local
- Defining attributes as global
- Enabling and disabling intelligent resource monitoring for agents manually
- Enabling and disabling IMF for agents by using script
- Linking and unlinking resources
- Bringing resources online
- Taking resources offline
- Probing a resource
- Clearing a resource
- About administering resource types
- Administering systems
- About administering clusters
- Configuring and unconfiguring the cluster UUID value
- Retrieving version information
- Adding and removing systems
- Changing ports for VCS
- Setting cluster attributes from the command line
- About initializing cluster attributes in the configuration file
- Enabling and disabling secure mode for the cluster
- Migrating from secure mode to secure mode with FIPS
- Using the -wait option in scripts that use VCS commands
- Running HA fire drills
- About administering simulated clusters from the command line
- Configuring applications and resources in VCS
- Configuring resources and applications
- VCS bundled agents for UNIX
- Configuring NFS service groups
- About NFS
- Configuring NFS service groups
- Sample configurations
- Sample configuration for a single NFS environment without lock recovery
- Sample configuration for a single NFS environment with lock recovery
- Sample configuration for a single NFSv4 environment
- Sample configuration for a multiple NFSv4 environment
- Sample configuration for a multiple NFS environment without lock recovery
- Sample configuration for a multiple NFS environment with lock recovery
- Sample configuration for configuring NFS with separate storage
- Sample configuration when configuring all NFS services in a parallel service group
- About configuring the RemoteGroup agent
- About configuring Samba service groups
- Configuring the Coordination Point agent
- About testing resource failover by using HA fire drills
- Predicting VCS behavior using VCS Simulator
- Section III. VCS communication and operations
- About communications, membership, and data protection in the cluster
- About cluster communications
- About cluster membership
- About membership arbitration
- About membership arbitration components
- About server-based I/O fencing
- About majority-based fencing
- About making CP server highly available
- About the CP server database
- Recommended CP server configurations
- About the CP server service group
- About the CP server user types and privileges
- About secure communication between the VCS cluster and CP server
- About data protection
- About I/O fencing configuration files
- Examples of VCS operation with I/O fencing
- About cluster membership and data protection without I/O fencing
- Examples of VCS operation without I/O fencing
- Summary of best practices for cluster communications
- Administering I/O fencing
- About administering I/O fencing
- About the vxfentsthdw utility
- General guidelines for using the vxfentsthdw utility
- About the vxfentsthdw command options
- Testing the coordinator disk group using the -c option of vxfentsthdw
- Performing non-destructive testing on the disks using the -r option
- Testing the shared disks using the vxfentsthdw -m option
- Testing the shared disks listed in a file using the vxfentsthdw -f option
- Testing all the disks in a disk group using the vxfentsthdw -g option
- Testing a disk with existing keys
- Testing disks with the vxfentsthdw -o option
- About the vxfenadm utility
- About the vxfenclearpre utility
- About the vxfenswap utility
- About administering the coordination point server
- CP server operations (cpsadm)
- Cloning a CP server
- Adding and removing VCS cluster entries from the CP server database
- Adding and removing a VCS cluster node from the CP server database
- Adding or removing CP server users
- Listing the CP server users
- Listing the nodes in all the VCS clusters
- Listing the membership of nodes in the VCS cluster
- Preempting a node
- Registering and unregistering a node
- Enable and disable access for a user to a VCS cluster
- Starting and stopping CP server outside VCS control
- Checking the connectivity of CP servers
- Adding and removing virtual IP addresses and ports for CP servers at run-time
- Taking a CP server database snapshot
- Replacing coordination points for server-based fencing in an online cluster
- Refreshing registration keys on the coordination points for server-based fencing
- About configuring a CP server to support IPv6 or dual stack
- Deployment and migration scenarios for CP server
- About migrating between disk-based and server-based fencing configurations
- Migrating from disk-based to server-based fencing in an online cluster
- Migrating from server-based to disk-based fencing in an online cluster
- Migrating between fencing configurations using response files
- Sample response file to migrate from disk-based to server-based fencing
- Sample response file to migrate from server-based fencing to disk-based fencing
- Sample response file to migrate from single CP server-based fencing to server-based fencing
- Response file variables to migrate between fencing configurations
- Enabling or disabling the preferred fencing policy
- About I/O fencing log files
- Controlling VCS behavior
- VCS behavior on resource faults
- About controlling VCS behavior at the service group level
- About the AutoRestart attribute
- About controlling failover on service group or system faults
- About defining failover policies
- About AdaptiveHA
- About system zones
- About sites
- Load-based autostart
- About freezing service groups
- About controlling Clean behavior on resource faults
- Clearing resources in the ADMIN_WAIT state
- About controlling fault propagation
- Customized behavior diagrams
- About preventing concurrency violation
- VCS behavior for resources that support the intentional offline functionality
- VCS behavior when a service group is restarted
- About controlling VCS behavior at the resource level
- Changing agent file paths and binaries
- VCS behavior on loss of storage connectivity
- Service group workload management
- Sample configurations depicting workload management
- The role of service group dependencies
- About communications, membership, and data protection in the cluster
- Section IV. Administration - Beyond the basics
- VCS event notification
- VCS event triggers
- About VCS event triggers
- Using event triggers
- List of event triggers
- About the dumptunables trigger
- About the globalcounter_not_updated trigger
- About the injeopardy event trigger
- About the loadwarning event trigger
- About the nofailover event trigger
- About the postoffline event trigger
- About the postonline event trigger
- About the preonline event trigger
- About the resadminwait event trigger
- About the resfault event trigger
- About the resnotoff event trigger
- About the resrestart event trigger
- About the resstatechange event trigger
- About the sysoffline event trigger
- About the sysup trigger
- About the sysjoin trigger
- About the unable_to_restart_agent event trigger
- About the unable_to_restart_had event trigger
- About the violation event trigger
- Virtual Business Services
- Section V. Veritas High Availability Configuration wizard
- Introducing the Veritas High Availability Configuration wizard
- Administering application monitoring from the Veritas High Availability view
- Administering application monitoring from the Veritas High Availability view
- Understanding the Veritas High Availability view
- To view the status of configured applications
- To configure or unconfigure application monitoring
- To start or stop applications
- To suspend or resume application monitoring
- To switch an application to another system
- To add or remove a failover system
- To clear Fault state
- To resolve a held-up operation
- To determine application state
- To remove all monitoring configurations
- To remove VCS cluster configurations
- Administering application monitoring settings
- Administering application monitoring from the Veritas High Availability view
- Section VI. Cluster configurations for disaster recovery
- Connecting clusters–Creating global clusters
- How VCS global clusters work
- VCS global clusters: The building blocks
- Visualization of remote cluster objects
- About global service groups
- About global cluster management
- About serialization - The Authority attribute
- About resiliency and "Right of way"
- VCS agents to manage wide-area failover
- About the Steward process: Split-brain in two-cluster global clusters
- Secure communication in global clusters
- Prerequisites for global clusters
- About planning to set up global clusters
- Setting up a global cluster
- About IPv6 support with global clusters
- About cluster faults
- About setting up a disaster recovery fire drill
- Multi-tiered application support using the RemoteGroup agent in a global environment
- Test scenario for a multi-tiered environment
- Administering global clusters from the command line
- About administering global clusters from the command line
- About global querying in a global cluster setup
- Administering global service groups in a global cluster setup
- Administering resources in a global cluster setup
- Administering clusters in global cluster setup
- Administering heartbeats in a global cluster setup
- Setting up replicated data clusters
- Setting up campus clusters
- Connecting clusters–Creating global clusters
- Section VII. Troubleshooting and performance
- VCS performance considerations
- How cluster components affect performance
- How cluster operations affect performance
- VCS performance consideration when booting a cluster system
- VCS performance consideration when a resource comes online
- VCS performance consideration when a resource goes offline
- VCS performance consideration when a service group comes online
- VCS performance consideration when a service group goes offline
- VCS performance consideration when a resource fails
- VCS performance consideration when a system fails
- VCS performance consideration when a network link fails
- VCS performance consideration when a system panics
- VCS performance consideration when a service group switches over
- VCS performance consideration when a service group fails over
- About scheduling class and priority configuration
- VCS agent statistics
- About VCS tunable parameters
- Troubleshooting and recovery for VCS
- VCS message logging
- Log unification of VCS agent's entry points
- Enhancing First Failure Data Capture (FFDC) to troubleshoot VCS resource's unexpected behavior
- GAB message logging
- Enabling debug logs for agents
- Enabling debug logs for IMF
- Enabling debug logs for the VCS engine
- Enable VCS logging for VxAT
- About debug log tags usage
- Gathering VCS information for support analysis
- Gathering LLT and GAB information for support analysis
- Gathering IMF information for support analysis
- Message catalogs
- Troubleshooting the VCS engine
- Troubleshooting Low Latency Transport (LLT)
- Troubleshooting Group Membership Services/Atomic Broadcast (GAB)
- Troubleshooting VCS startup
- Troubleshooting issues with systemd unit service files
- If a unit service has failed and the corresponding module is still loaded, systemd cannot unload it and so its package cannot be removed
- If a unit service is active and the corresponding process is stopped outside of systemd, the service cannot be started again using 'systemctl start'
- If a unit service takes longer than the default timeout to stop or start the corresponding service, it goes into the Failed state
- Troubleshooting Intelligent Monitoring Framework (IMF)
- Troubleshooting service groups
- VCS does not automatically start service group
- System is not in RUNNING state
- Service group not configured to run on the system
- Service group not configured to autostart
- Service group is frozen
- Failover service group is online on another system
- A critical resource faulted
- Service group autodisabled
- Service group is waiting for the resource to be brought online/taken offline
- Service group is waiting for a dependency to be met.
- Service group not fully probed.
- Service group does not fail over to the forecasted system
- Service group does not fail over to the BiggestAvailable system even if FailOverPolicy is set to BiggestAvailable
- Restoring metering database from backup taken by VCS
- Initialization of metering database fails
- Error message appears during service group failover or switch
- Troubleshooting resources
- Troubleshooting sites
- Troubleshooting I/O fencing
- Node is unable to join cluster while another node is being ejected
- The vxfentsthdw utility fails when SCSI TEST UNIT READY command fails
- Manually removing existing keys from SCSI-3 disks
- System panics to prevent potential data corruption
- Cluster ID on the I/O fencing key of coordinator disk does not match the local cluster's ID
- Fencing startup reports preexisting split-brain
- Registered keys are lost on the coordinator disks
- Replacing defective disks when the cluster is offline
- The vxfenswap utility exits if rcp or scp commands are not functional
- Troubleshooting CP server
- Troubleshooting server-based fencing on the VCS cluster nodes
- Issues during online migration of coordination points
- Troubleshooting notification
- Troubleshooting and recovery for global clusters
- Troubleshooting the steward process
- Troubleshooting licensing
- Validating license keys
- Licensing error messages
- [Licensing] Insufficient memory to perform operation
- [Licensing] No valid VCS license keys were found
- [Licensing] Unable to find a valid base VCS license key
- [Licensing] License key cannot be used on this OS platform
- [Licensing] VCS evaluation period has expired
- [Licensing] License key can not be used on this system
- [Licensing] Unable to initialize the licensing framework
- [Licensing] QuickStart is not supported in this release
- [Licensing] Your evaluation period for the feature has expired. This feature will not be enabled the next time VCS starts
- Troubleshooting secure configurations
- Troubleshooting wizard-based configuration issues
- Troubleshooting issues with the Veritas High Availability view
- VCS message logging
- VCS performance considerations
- Section VIII. Appendixes
To add or remove a failover system
Each row in the application table displays the status of an application on systems that are part of a VCS cluster. The displayed system/s either form a single-system Cluster Server (VCS) cluster with application restart configured as a high-availability measure, or a multi-system VCS cluster with application failover configured. In the displayed cluster, you can add a new system as a failover system for the configured application.
The system must fulfill the following conditions:
The system is not part of any other VCS cluster.
The system has at least two network adapters.
The host name of the system must be resolvable through the DNS server or, locally, using /etc/hosts file entries.
The required ports are not blocked by a firewall.
The application is installed identically on all the systems, including the proposed new system.
To add a failover system, perform the following steps:
Note:
The following procedure describes generic steps to add a failover system. The wizard automatically populates values for initially configured systems in some fields. These values are not editable.
To add a failover system
- In the appropriate row of the application table, click More > Add Failover System.
- Review the instructions on the welcome page of the Veritas High Availability Configuration Wizard, and click Next.
- If you want to add a system from the Cluster systems list to the Application failover targets list, on the Configuration Inputs panel, select the system in the Cluster systems list. Use the Edit icon to specify an administrative user account on the system. You can then move the required system from the Cluster system list to the Application failover targets list. Use the up and down arrow keys to set the order of systems in which VCS agent must failover applications.
If you want to specify a failover system that is not an existing cluster node, on the Configuration Inputs panel, click Add System, and in the Add System dialog box, specify the following details:
System Name or IP address
Specify the name or IP address of the system that you want to add to the VCS cluster.
User name
Specify the user name with administrative privileges on the system.
If you want to specify the same user account on all systems that you want to add, check the Use the specified user account on all systems box.
Password
Specify the password for the account you specified.
Use the specified user account on all systems
Click this check box to use the specified user credentials on all the cluster systems.
The wizard validates the details, and the system then appears in the Application failover target list.
- If you are adding a failover system from the existing VCS cluster, the Network Details panel does not appear.
If you are adding a new failover system to the existing cluster, on the Network Details panel, review the networking parameters used by existing failover systems. Appropriately modify the following parameters for the new failover system.
Note:
The wizard automatically populates the networking protocol (UDP or Ethernet) used by the existing failover systems for Low Latency Transport communication. You cannot modify these settings.
To configure links over ethernet, select the adapter for each network communication link. You must select a different network adapter for each communication link.
To configure links over UDP, specify the required details for each communication link.
Network Adapter
Select a network adapter for the communication links.
You must select a different network adapter for each communication link.
Veritas recommends that one of the network adapters must be a public adapter and the VCS cluster communication link using this adapter is assigned a low priority.
Note:
Do not select the teamed network adapter or the independently listed adapters that are a part of teamed NIC.
IP Address
Select the IP address to be used for cluster communication over the specified UDP port.
Port
Specify a unique port number for each link. You can use ports in the range 49152 to 65535.
The specified port for a link is used for all the cluster systems on that link.
Subnet mask
Displays the subnet mask to which the specified IP belongs.
- If a virtual IP is not configured as part of your application monitoring configuration, the Virtual Network Details page is not displayed. Else, on the Virtual Network Details panel, review the following networking parameters that the failover system must use, and specify the NIC:
Virtual IP address
Specifies a unique virtual IP address.
Subnet mask
Specifies the subnet mask to which the IP address belongs.
NIC
For each newly added system, specify the network adaptor that must host the specified virtual IP.
- On the Configuration Summary panel, review the VCS cluster configuration summary, and then click Next to proceed with the configuration.
- On the Implementation panel, the wizard adds the specified system to the VCS cluster, if it is not already a part. It then adds the system to the list of failover targets. The wizard displays a progress report of each task.
If the wizard displays an error, click View Logs to review the error description, troubleshoot the error, and re-run the wizard from the Veritas High Availability view.
Click Next.
- On the Finish panel, click Finish. This completes the procedure for adding a failover system. You can view the system in the appropriate row of the application table.
Similarly you can also remove a system from the list of application failover targets.
Note:
You cannot remove a failover system if an application is online or partially online on the system.
To remove a failover system
- In the appropriate row of the application table, click More > Remove Failover System.
- On the Remove Failover System panel, click the system that you want to remove from the monitoring configuration, and then click OK.
Note:
This procedure only removes the system from the list of failover target systems, not from the VCS cluster. To remove a system from the cluster, use VCS commands. For details, see the VCS Administrator's Guide.