InfoScale™ 9.0 Cluster Server Administrator's Guide - AIX
- Section I. Clustering concepts and terminology- Introducing Cluster Server- About Cluster Server
- About cluster control guidelines
- About the physical components of VCS
- Logical components of VCS- About resources and resource dependencies
- Categories of resources
- About resource types
- About service groups
- Types of service groups
- About the ClusterService group
- About the cluster UUID
- About agents in VCS
- About agent functions
- About resource monitoring
- Agent classifications
- VCS agent framework
- About cluster control, communications, and membership
- About security services
- Components for administering VCS
 
- Putting the pieces together
 
- About cluster topologies
- VCS configuration concepts
 
- Introducing Cluster Server
- Section II. Administration - Putting VCS to work- About the VCS user privilege model
- Administering the cluster from the command line- About administering VCS from the command line
- About installing a VCS license
- Administering LLT
- Administering the AMF kernel driver
- Starting VCS
- Stopping VCS
- Stopping VCS without evacuating service groups
- Stopping the VCS engine and related processes
- Logging on to VCS
- About managing VCS configuration files
- About managing VCS users from the command line
- About querying VCS
- About administering service groups- Adding and deleting service groups
- Modifying service group attributes
- Bringing service groups online
- Taking service groups offline
- Switching service groups
- Migrating service groups
- Freezing and unfreezing service groups
- Enabling and disabling service groups
- Enabling and disabling priority based failover for a service group
- Clearing faulted resources in a service group
- Flushing service groups
- Linking and unlinking service groups
 
- Administering agents
- About administering resources- About adding resources
- Adding resources
- Deleting resources
- Adding, deleting, and modifying resource attributes
- Defining attributes as local
- Defining attributes as global
- Enabling and disabling intelligent resource monitoring for agents manually
- Enabling and disabling IMF for agents by using script
- Linking and unlinking resources
- Bringing resources online
- Taking resources offline
- Probing a resource
- Clearing a resource
 
- About administering resource types
- Administering systems
- About administering clusters- Configuring and unconfiguring the cluster UUID value
- Retrieving version information
- Adding and removing systems
- Changing ports for VCS
- Setting cluster attributes from the command line
- About initializing cluster attributes in the configuration file
- Enabling and disabling secure mode for the cluster
- Migrating from secure mode to secure mode with FIPS
 
- Using the -wait option in scripts that use VCS commands
- Running HA fire drills
 
- Configuring applications and resources in VCS- Configuring resources and applications
- VCS bundled agents for UNIX
- Configuring NFS service groups- About NFS
- Configuring NFS service groups
- Sample configurations- Sample configuration for a single NFS environment without lock recovery
- Sample configuration for a single NFS environment with lock recovery
- Sample configuration for a single NFSv4 environment
- Sample configuration for a multiple NFSv4 environment
- Sample configuration for a multiple NFS environment without lock recovery
- Sample configuration for a multiple NFS environment with lock recovery
- Sample configuration for configuring NFS with separate storage
- Sample configuration when configuring all NFS services in a parallel service group
 
 
- About configuring the RemoteGroup agent
- About configuring Samba service groups
- Configuring the Coordination Point agent
- About migration of data from LVM volumes to VxVM volumes
- About testing resource failover by using HA fire drills
 
 
- Section III. VCS communication and operations- About communications, membership, and data protection in the cluster- About cluster communications
- About cluster membership
- About membership arbitration- About membership arbitration components
- About server-based I/O fencing
- About majority-based fencing
- About making CP server highly available
- About the CP server database
- Recommended CP server configurations
- About the CP server service group
- About the CP server user types and privileges
- About secure communication between the VCS cluster and CP server
 
- About data protection
- About I/O fencing configuration files
- Examples of VCS operation with I/O fencing
- About cluster membership and data protection without I/O fencing
- Examples of VCS operation without I/O fencing
- Summary of best practices for cluster communications
 
- Administering I/O fencing- About administering I/O fencing
- About the vxfentsthdw utility- General guidelines for using the vxfentsthdw utility
- About the vxfentsthdw command options
- Testing the coordinator disk group using the -c option of vxfentsthdw
- Performing non-destructive testing on the disks using the -r option
- Testing the shared disks using the vxfentsthdw -m option
- Testing the shared disks listed in a file using the vxfentsthdw -f option
- Testing all the disks in a disk group using the vxfentsthdw -g option
- Testing a disk with existing keys
- Testing disks with the vxfentsthdw -o option
 
- About the vxfenadm utility
- About the vxfenclearpre utility
- About the vxfenswap utility
- About administering the coordination point server- CP server operations (cpsadm)
- Cloning a CP server
- Adding and removing VCS cluster entries from the CP server database
- Adding and removing a VCS cluster node from the CP server database
- Adding or removing CP server users
- Listing the CP server users
- Listing the nodes in all the VCS clusters
- Listing the membership of nodes in the VCS cluster
- Preempting a node
- Registering and unregistering a node
- Enable and disable access for a user to a VCS cluster
- Starting and stopping CP server outside VCS control
- Checking the connectivity of CP servers
- Adding and removing virtual IP addresses and ports for CP servers at run-time
- Taking a CP server database snapshot
- Replacing coordination points for server-based fencing in an online cluster
- Refreshing registration keys on the coordination points for server-based fencing
- About configuring a CP server to support IPv6 or dual stack
- Deployment and migration scenarios for CP server
 
- About migrating between disk-based and server-based fencing configurations- Migrating from disk-based to server-based fencing in an online cluster
- Migrating from server-based to disk-based fencing in an online cluster
- Migrating between fencing configurations using response files- Sample response file to migrate from disk-based to server-based fencing
- Sample response file to migrate from server-based fencing to disk-based fencing
- Sample response file to migrate from single CP server-based fencing to server-based fencing
- Response file variables to migrate between fencing configurations
 
 
- Enabling or disabling the preferred fencing policy
- About I/O fencing log files
 
- Controlling VCS behavior- VCS behavior on resource faults
- About controlling VCS behavior at the service group level- About the AutoRestart attribute
- About controlling failover on service group or system faults
- About defining failover policies
- About AdaptiveHA
- About system zones
- About sites
- Load-based autostart
- About freezing service groups
- About controlling Clean behavior on resource faults
- Clearing resources in the ADMIN_WAIT state
- About controlling fault propagation
- Customized behavior diagrams
- About preventing concurrency violation
- VCS behavior for resources that support the intentional offline functionality
- VCS behavior when a service group is restarted
 
- About controlling VCS behavior at the resource level
- Changing agent file paths and binaries
- VCS behavior on loss of storage connectivity
- Service group workload management
- Sample configurations depicting workload management
 
- The role of service group dependencies
 
- About communications, membership, and data protection in the cluster
- Section IV. Administration - Beyond the basics- VCS event notification
- VCS event triggers- About VCS event triggers
- Using event triggers
- List of event triggers- About the dumptunables trigger
- About the globalcounter_not_updated trigger
- About the injeopardy event trigger
- About the loadwarning event trigger
- About the multinicb event trigger
- About the nofailover event trigger
- About the postoffline event trigger
- About the postonline event trigger
- About the preonline event trigger
- About the resadminwait event trigger
- About the resfault event trigger
- About the resnotoff event trigger
- About the resrestart event trigger
- About the resstatechange event trigger
- About the sysoffline event trigger
- About the sysup trigger
- About the sysjoin trigger
- About the unable_to_restart_agent event trigger
- About the unable_to_restart_had event trigger
- About the violation event trigger
 
 
- Virtual Business Services
 
- Section V. Cluster configurations for disaster recovery- Connecting clusters–Creating global clusters- How VCS global clusters work
- VCS global clusters: The building blocks- Visualization of remote cluster objects
- About global service groups
- About global cluster management
- About serialization - The Authority attribute
- About resiliency and "Right of way"
- VCS agents to manage wide-area failover
- About the Steward process: Split-brain in two-cluster global clusters
- Secure communication in global clusters
 
- Prerequisites for global clusters
- About planning to set up global clusters
- Setting up a global cluster- Configuring application and replication for global cluster setup
- Configuring clusters for global cluster setup- Configuring global cluster components at the primary site
- Installing and configuring VCS at the secondary site
- Securing communication between the wide-area connectors
- Gcoconfig utility support
- Configuring remote cluster objects
- Configuring additional heartbeat links (optional)
- Configuring the Steward process (optional)
 
- Configuring service groups for global cluster setup
- Configuring a service group as a global service group
 
- About IPv6 support with global clusters
- About cluster faults
- About setting up a disaster recovery fire drill
- Multi-tiered application support using the RemoteGroup agent in a global environment
- Test scenario for a multi-tiered environment
 
- Administering global clusters from the command line- About administering global clusters from the command line
- About global querying in a global cluster setup
- Administering global service groups in a global cluster setup
- Administering resources in a global cluster setup
- Administering clusters in global cluster setup
- Administering heartbeats in a global cluster setup
 
- Setting up replicated data clusters
- Setting up campus clusters
 
- Connecting clusters–Creating global clusters
- Section VI. Troubleshooting and performance- VCS performance considerations- How cluster components affect performance
- How cluster operations affect performance- VCS performance consideration when booting a cluster system
- VCS performance consideration when a resource comes online
- VCS performance consideration when a resource goes offline
- VCS performance consideration when a service group comes online
- VCS performance consideration when a service group goes offline
- VCS performance consideration when a resource fails
- VCS performance consideration when a system fails
- VCS performance consideration when a network link fails
- VCS performance consideration when a system panics
- VCS performance consideration when a service group switches over
- VCS performance consideration when a service group fails over
 
- About scheduling class and priority configuration
- CPU binding of HAD
- VCS agent statistics
- About VCS tunable parameters
 
- Troubleshooting and recovery for VCS- VCS message logging- Log unification of VCS agent's entry points
- Enhancing First Failure Data Capture (FFDC) to troubleshoot VCS resource's unexpected behavior
- GAB message logging
- Enabling debug logs for agents
- Enabling debug logs for IMF
- Enabling debug logs for the VCS engine
- Enabling debug logs for VxAT
- About debug log tags usage
- Gathering VCS information for support analysis
- Gathering LLT and GAB information for support analysis
- Gathering IMF information for support analysis
- Message catalogs
 
- Troubleshooting the VCS engine
- Troubleshooting Low Latency Transport (LLT)
- Troubleshooting Group Membership Services/Atomic Broadcast (GAB)
- Troubleshooting VCS startup
- Troubleshooting Intelligent Monitoring Framework (IMF)
- Troubleshooting service groups- VCS does not automatically start service group
- System is not in RUNNING state
- Service group not configured to run on the system
- Service group not configured to autostart
- Service group is frozen
- Failover service group is online on another system
- A critical resource faulted
- Service group autodisabled
- Service group is waiting for the resource to be brought online/taken offline
- Service group is waiting for a dependency to be met.
- Service group not fully probed.
- Service group does not fail over to the forecasted system
- Service group does not fail over to the BiggestAvailable system even if FailOverPolicy is set to BiggestAvailable
- Restoring metering database from backup taken by VCS
- Initialization of metering database fails
- Error message appears during service group failover or switch
 
- Troubleshooting resources
- Troubleshooting sites
- Troubleshooting I/O fencing- Node is unable to join cluster while another node is being ejected
- The vxfentsthdw utility fails when SCSI TEST UNIT READY command fails
- Manually removing existing keys from SCSI-3 disks
- System panics to prevent potential data corruption
- Cluster ID on the I/O fencing key of coordinator disk does not match the local cluster's ID
- Fencing startup reports preexisting split-brain
- Registered keys are lost on the coordinator disks
- Replacing defective disks when the cluster is offline
- The vxfenswap utility exits if rcp or scp commands are not functional
- Troubleshooting CP server
- Troubleshooting server-based fencing on the VCS cluster nodes
- Issues during online migration of coordination points
 
- Troubleshooting notification
- Troubleshooting and recovery for global clusters
- Troubleshooting the steward process
- Troubleshooting licensing- Validating license keys
- Licensing error messages- [Licensing] Insufficient memory to perform operation
- [Licensing] No valid VCS license keys were found
- [Licensing] Unable to find a valid base VCS license key
- [Licensing] License key cannot be used on this OS platform
- [Licensing] VCS evaluation period has expired
- [Licensing] License key can not be used on this system
- [Licensing] Unable to initialize the licensing framework
- [Licensing] QuickStart is not supported in this release
- [Licensing] Your evaluation period for the feature has expired. This feature will not be enabled the next time VCS starts
 
 
- Troubleshooting secure configurations
 
- VCS message logging
 
- VCS performance considerations
- Section VII. Appendixes
Types of alerts
VCS generates the following types of alerts.
- CFAULT - Indicates that a cluster has faulted 
- GNOFAILA - Indicates that a global group is unable to fail over within the cluster where it was online. This alert is displayed if the ClusterFailOverPolicy attribute is set to Manual and the wide-area connector (wac) is properly configured and running at the time of the fault. 
- GNOFAIL - Indicates that a global group is unable to fail over to any system within the cluster or in a remote cluster. - Some reasons why a global group may not be able to fail over to a remote cluster: - The ClusterFailOverPolicy is set to either Auto or Connected and VCS is unable to determine a valid remote cluster to which to automatically fail the group over. 
- The ClusterFailOverPolicy attribute is set to Connected and the cluster in which the group has faulted cannot communicate with one ore more remote clusters in the group's ClusterList. 
- The wide-area connector (wac) is not online or is incorrectly configured in the cluster in which the group has faulted