InfoScale™ 9.0 Cluster Server Administrator's Guide - AIX
- Section I. Clustering concepts and terminology
- Introducing Cluster Server
- About Cluster Server
- About cluster control guidelines
- About the physical components of VCS
- Logical components of VCS
- About resources and resource dependencies
- Categories of resources
- About resource types
- About service groups
- Types of service groups
- About the ClusterService group
- About the cluster UUID
- About agents in VCS
- About agent functions
- About resource monitoring
- Agent classifications
- VCS agent framework
- About cluster control, communications, and membership
- About security services
- Components for administering VCS
- Putting the pieces together
- About cluster topologies
- VCS configuration concepts
- Introducing Cluster Server
- Section II. Administration - Putting VCS to work
- About the VCS user privilege model
- Administering the cluster from the command line
- About administering VCS from the command line
- About installing a VCS license
- Administering LLT
- Administering the AMF kernel driver
- Starting VCS
- Stopping VCS
- Stopping VCS without evacuating service groups
- Stopping the VCS engine and related processes
- Logging on to VCS
- About managing VCS configuration files
- About managing VCS users from the command line
- About querying VCS
- About administering service groups
- Adding and deleting service groups
- Modifying service group attributes
- Bringing service groups online
- Taking service groups offline
- Switching service groups
- Migrating service groups
- Freezing and unfreezing service groups
- Enabling and disabling service groups
- Enabling and disabling priority based failover for a service group
- Clearing faulted resources in a service group
- Flushing service groups
- Linking and unlinking service groups
- Administering agents
- About administering resources
- About adding resources
- Adding resources
- Deleting resources
- Adding, deleting, and modifying resource attributes
- Defining attributes as local
- Defining attributes as global
- Enabling and disabling intelligent resource monitoring for agents manually
- Enabling and disabling IMF for agents by using script
- Linking and unlinking resources
- Bringing resources online
- Taking resources offline
- Probing a resource
- Clearing a resource
- About administering resource types
- Administering systems
- About administering clusters
- Configuring and unconfiguring the cluster UUID value
- Retrieving version information
- Adding and removing systems
- Changing ports for VCS
- Setting cluster attributes from the command line
- About initializing cluster attributes in the configuration file
- Enabling and disabling secure mode for the cluster
- Migrating from secure mode to secure mode with FIPS
- Using the -wait option in scripts that use VCS commands
- Running HA fire drills
- Configuring applications and resources in VCS
- Configuring resources and applications
- VCS bundled agents for UNIX
- Configuring NFS service groups
- About NFS
- Configuring NFS service groups
- Sample configurations
- Sample configuration for a single NFS environment without lock recovery
- Sample configuration for a single NFS environment with lock recovery
- Sample configuration for a single NFSv4 environment
- Sample configuration for a multiple NFSv4 environment
- Sample configuration for a multiple NFS environment without lock recovery
- Sample configuration for a multiple NFS environment with lock recovery
- Sample configuration for configuring NFS with separate storage
- Sample configuration when configuring all NFS services in a parallel service group
- About configuring the RemoteGroup agent
- About configuring Samba service groups
- Configuring the Coordination Point agent
- About migration of data from LVM volumes to VxVM volumes
- About testing resource failover by using HA fire drills
- Section III. VCS communication and operations
- About communications, membership, and data protection in the cluster
- About cluster communications
- About cluster membership
- About membership arbitration
- About membership arbitration components
- About server-based I/O fencing
- About majority-based fencing
- About making CP server highly available
- About the CP server database
- Recommended CP server configurations
- About the CP server service group
- About the CP server user types and privileges
- About secure communication between the VCS cluster and CP server
- About data protection
- About I/O fencing configuration files
- Examples of VCS operation with I/O fencing
- About cluster membership and data protection without I/O fencing
- Examples of VCS operation without I/O fencing
- Summary of best practices for cluster communications
- Administering I/O fencing
- About administering I/O fencing
- About the vxfentsthdw utility
- General guidelines for using the vxfentsthdw utility
- About the vxfentsthdw command options
- Testing the coordinator disk group using the -c option of vxfentsthdw
- Performing non-destructive testing on the disks using the -r option
- Testing the shared disks using the vxfentsthdw -m option
- Testing the shared disks listed in a file using the vxfentsthdw -f option
- Testing all the disks in a disk group using the vxfentsthdw -g option
- Testing a disk with existing keys
- Testing disks with the vxfentsthdw -o option
- About the vxfenadm utility
- About the vxfenclearpre utility
- About the vxfenswap utility
- About administering the coordination point server
- CP server operations (cpsadm)
- Cloning a CP server
- Adding and removing VCS cluster entries from the CP server database
- Adding and removing a VCS cluster node from the CP server database
- Adding or removing CP server users
- Listing the CP server users
- Listing the nodes in all the VCS clusters
- Listing the membership of nodes in the VCS cluster
- Preempting a node
- Registering and unregistering a node
- Enable and disable access for a user to a VCS cluster
- Starting and stopping CP server outside VCS control
- Checking the connectivity of CP servers
- Adding and removing virtual IP addresses and ports for CP servers at run-time
- Taking a CP server database snapshot
- Replacing coordination points for server-based fencing in an online cluster
- Refreshing registration keys on the coordination points for server-based fencing
- About configuring a CP server to support IPv6 or dual stack
- Deployment and migration scenarios for CP server
- About migrating between disk-based and server-based fencing configurations
- Migrating from disk-based to server-based fencing in an online cluster
- Migrating from server-based to disk-based fencing in an online cluster
- Migrating between fencing configurations using response files
- Sample response file to migrate from disk-based to server-based fencing
- Sample response file to migrate from server-based fencing to disk-based fencing
- Sample response file to migrate from single CP server-based fencing to server-based fencing
- Response file variables to migrate between fencing configurations
- Enabling or disabling the preferred fencing policy
- About I/O fencing log files
- Controlling VCS behavior
- VCS behavior on resource faults
- About controlling VCS behavior at the service group level
- About the AutoRestart attribute
- About controlling failover on service group or system faults
- About defining failover policies
- About AdaptiveHA
- About system zones
- About sites
- Load-based autostart
- About freezing service groups
- About controlling Clean behavior on resource faults
- Clearing resources in the ADMIN_WAIT state
- About controlling fault propagation
- Customized behavior diagrams
- About preventing concurrency violation
- VCS behavior for resources that support the intentional offline functionality
- VCS behavior when a service group is restarted
- About controlling VCS behavior at the resource level
- Changing agent file paths and binaries
- VCS behavior on loss of storage connectivity
- Service group workload management
- Sample configurations depicting workload management
- The role of service group dependencies
- About communications, membership, and data protection in the cluster
- Section IV. Administration - Beyond the basics
- VCS event notification
- VCS event triggers
- About VCS event triggers
- Using event triggers
- List of event triggers
- About the dumptunables trigger
- About the globalcounter_not_updated trigger
- About the injeopardy event trigger
- About the loadwarning event trigger
- About the multinicb event trigger
- About the nofailover event trigger
- About the postoffline event trigger
- About the postonline event trigger
- About the preonline event trigger
- About the resadminwait event trigger
- About the resfault event trigger
- About the resnotoff event trigger
- About the resrestart event trigger
- About the resstatechange event trigger
- About the sysoffline event trigger
- About the sysup trigger
- About the sysjoin trigger
- About the unable_to_restart_agent event trigger
- About the unable_to_restart_had event trigger
- About the violation event trigger
- Virtual Business Services
- Section V. Cluster configurations for disaster recovery
- Connecting clusters–Creating global clusters
- How VCS global clusters work
- VCS global clusters: The building blocks
- Visualization of remote cluster objects
- About global service groups
- About global cluster management
- About serialization - The Authority attribute
- About resiliency and "Right of way"
- VCS agents to manage wide-area failover
- About the Steward process: Split-brain in two-cluster global clusters
- Secure communication in global clusters
- Prerequisites for global clusters
- About planning to set up global clusters
- Setting up a global cluster
- Configuring application and replication for global cluster setup
- Configuring clusters for global cluster setup
- Configuring global cluster components at the primary site
- Installing and configuring VCS at the secondary site
- Securing communication between the wide-area connectors
- Gcoconfig utility support
- Configuring remote cluster objects
- Configuring additional heartbeat links (optional)
- Configuring the Steward process (optional)
- Configuring service groups for global cluster setup
- Configuring a service group as a global service group
- About IPv6 support with global clusters
- About cluster faults
- About setting up a disaster recovery fire drill
- Multi-tiered application support using the RemoteGroup agent in a global environment
- Test scenario for a multi-tiered environment
- Administering global clusters from the command line
- About administering global clusters from the command line
- About global querying in a global cluster setup
- Administering global service groups in a global cluster setup
- Administering resources in a global cluster setup
- Administering clusters in global cluster setup
- Administering heartbeats in a global cluster setup
- Setting up replicated data clusters
- Setting up campus clusters
- Connecting clusters–Creating global clusters
- Section VI. Troubleshooting and performance
- VCS performance considerations
- How cluster components affect performance
- How cluster operations affect performance
- VCS performance consideration when booting a cluster system
- VCS performance consideration when a resource comes online
- VCS performance consideration when a resource goes offline
- VCS performance consideration when a service group comes online
- VCS performance consideration when a service group goes offline
- VCS performance consideration when a resource fails
- VCS performance consideration when a system fails
- VCS performance consideration when a network link fails
- VCS performance consideration when a system panics
- VCS performance consideration when a service group switches over
- VCS performance consideration when a service group fails over
- About scheduling class and priority configuration
- CPU binding of HAD
- VCS agent statistics
- About VCS tunable parameters
- Troubleshooting and recovery for VCS
- VCS message logging
- Log unification of VCS agent's entry points
- Enhancing First Failure Data Capture (FFDC) to troubleshoot VCS resource's unexpected behavior
- GAB message logging
- Enabling debug logs for agents
- Enabling debug logs for IMF
- Enabling debug logs for the VCS engine
- Enabling debug logs for VxAT
- About debug log tags usage
- Gathering VCS information for support analysis
- Gathering LLT and GAB information for support analysis
- Gathering IMF information for support analysis
- Message catalogs
- Troubleshooting the VCS engine
- Troubleshooting Low Latency Transport (LLT)
- Troubleshooting Group Membership Services/Atomic Broadcast (GAB)
- Troubleshooting VCS startup
- Troubleshooting Intelligent Monitoring Framework (IMF)
- Troubleshooting service groups
- VCS does not automatically start service group
- System is not in RUNNING state
- Service group not configured to run on the system
- Service group not configured to autostart
- Service group is frozen
- Failover service group is online on another system
- A critical resource faulted
- Service group autodisabled
- Service group is waiting for the resource to be brought online/taken offline
- Service group is waiting for a dependency to be met.
- Service group not fully probed.
- Service group does not fail over to the forecasted system
- Service group does not fail over to the BiggestAvailable system even if FailOverPolicy is set to BiggestAvailable
- Restoring metering database from backup taken by VCS
- Initialization of metering database fails
- Error message appears during service group failover or switch
- Troubleshooting resources
- Troubleshooting sites
- Troubleshooting I/O fencing
- Node is unable to join cluster while another node is being ejected
- The vxfentsthdw utility fails when SCSI TEST UNIT READY command fails
- Manually removing existing keys from SCSI-3 disks
- System panics to prevent potential data corruption
- Cluster ID on the I/O fencing key of coordinator disk does not match the local cluster's ID
- Fencing startup reports preexisting split-brain
- Registered keys are lost on the coordinator disks
- Replacing defective disks when the cluster is offline
- The vxfenswap utility exits if rcp or scp commands are not functional
- Troubleshooting CP server
- Troubleshooting server-based fencing on the VCS cluster nodes
- Issues during online migration of coordination points
- Troubleshooting notification
- Troubleshooting and recovery for global clusters
- Troubleshooting the steward process
- Troubleshooting licensing
- Validating license keys
- Licensing error messages
- [Licensing] Insufficient memory to perform operation
- [Licensing] No valid VCS license keys were found
- [Licensing] Unable to find a valid base VCS license key
- [Licensing] License key cannot be used on this OS platform
- [Licensing] VCS evaluation period has expired
- [Licensing] License key can not be used on this system
- [Licensing] Unable to initialize the licensing framework
- [Licensing] QuickStart is not supported in this release
- [Licensing] Your evaluation period for the feature has expired. This feature will not be enabled the next time VCS starts
- Troubleshooting secure configurations
- VCS message logging
- VCS performance considerations
- Section VII. Appendixes
System attributes
Table: System attributes lists the system attributes.
System Attributes | Definition |
|---|---|
AgentsStopped (system use only) | This attribute is set to 1 on a system when all agents running on the system are stopped.
|
AvailableCapacity (system use only) | Indicates the available capacity of the system. The function of this attribute depends on the value of the cluster-level attribute Statistics. If the value of the Statistics is:
You cannot configure this attribute in the main.cf file.
|
Capacity (user-defined) | Represents total capacity of a system. The possible values are:
|
ConfigBlockCount (system use only) | Number of 512-byte blocks in configuration when the system joined the cluster.
|
ConfigCheckSum (system use only) | Sixteen-bit checksum of configuration identifying when the system joined the cluster.
|
ConfigDiskState (system use only) | State of configuration on the disk when the system joined the cluster.
|
ConfigFile (system use only) | Directory containing the configuration files.
|
ConfigInfoCnt (system use only) | The count of outstanding CONFIG_INFO messages the local node expects from a new membership message. This attribute is non-zero for the brief period during which new membership is processed. When the value returns to 0, the state of all nodes in the cluster is determined.
|
ConfigModDate (system use only) | Last modification date of configuration when the system joined the cluster.
|
CPUBinding | Binds HAD to a particular logical processor ID depending on the value of BindTo and CPUNumber. Interrupts are disabled on the logical processor that HAD binds to.
CPUNumber specifies the logical processor ID to which HAD binds. CPUNumber is used only when BindTo is specified as CPUNUM. BindTo can take one of the following values:
|
CPUThresholdLevel (user-defined) | Determines the threshold values for CPU utilization based on which various levels of logs are generated. The notification levels are Critical, Warning, Note, and Info, and the logs are stored in the file engine_A.log. If the Warning level is crossed, a notification is generated. The values are configurable at a system level in the cluster.
|
CPUUsage (system use only) | This attribute is deprecated. VCS monitors system resources on startup. |
CPUUsageMonitoring |
This attribute is deprecated. VCS monitors system resources on startup. |
CurrentLimits (system use only) | System-maintained calculation of current value of Limits. CurrentLimits = Limits - (additive value of all service group Prerequisites).
|
DiskHbStatus (system use only) | Deprecated attribute. Indicates status of communication disks on any system.
|
DynamicLoad (user-defined) | System-maintained value of current dynamic load. The value is set external to VCS with the hasys -load command. When you specify the dynamic system load, VCS does not use the static group load.
|
EngineRestarted (system use only) | Indicates whether the VCS engine (HAD) was restarted by the hashadow process on a node in the cluster. The value 1 indicates that the engine was restarted; 0 indicates it was not restarted.
|
EngineVersion (system use only) | Specifies the major, minor, maintenance-patch, and point-patch version of VCS. The value of EngineVersion attribute is in hexa-decimal format. To retrieve version information: Major Version: EngineVersion >> 24 & 0xff Minor Version: EngineVersion >> 16 & 0xff Maint Patch: EngineVersion >> 8 & 0xff Point Patch: EngineVersion & 0xff
|
FencingWeight (user-defined) | Indicates the system priority for preferred fencing. This value is relative to other systems in the cluster and does not reflect any real value associated with a particular system. If the cluster-level attribute value for PreferredFencingPolicy is set to System, VCS uses this FencingWeight attribute to determine the node weight to ascertain the surviving subcluster during I/O fencing race.
|
Frozen (system use only) | Indicates if service groups can be brought online on the system. Groups cannot be brought online if the attribute value is 1.
|
GUIIPAddr (user-defined) | Determines the local IP address that VCS uses to accept connections. Incoming connections over other IP addresses are dropped. If GUIIPAddr is not set, the default behavior is to accept external connections over all configured local IP addresses.
|
HostAvailableForecast (system use only) | Indicates the forecasted available capacities of the systems in a cluster based on the past metered AvailableCapacity. The HostMonitor agent auto-populates values for this attribute, if the cluster attribute Statistics is set to Enabled. It has all the keys specified in HostMeters, such as CPU, Mem, and Swap. The values for keys are set in corresponding units as specified in the Cluster attribute MeterUnit. You cannot configure this attribute in main.cf.
|
HostMonitor (system use only) |
List of host resources that the HostMonitor agent monitors. The values of keys such as Mem and Swap are measured in MB or GB, and CPU is measured in MHz or GHz.
|
HostUtilization (system use only) |
Indicates the percentage usage of the resources on the host as computed by the HostMonitor agent. This attribute populates all parameters specified in the cluster attribute HostMeters if Statistics is set to MeterHostOnly or Enabled.
|
LicenseType (system use only) | Indicates the license type of the base VCS key used by the system. Possible values are: 0 - DEMO 1 - PERMANENT 2 - PERMANENT_NODE_LOCK 3 - DEMO_NODE_LOCK 4 - NFR 5 - DEMO_EXTENSION 6 - NFR_NODE_LOCK 7 - DEMO_EXTENSION_NODE_LOCK
|
Limits (user-defined) | An unordered set of name=value pairs denoting specific resources available on a system. Names are arbitrary and are set by the administrator for any value. Names are not obtained from the system. The format for Limits is: Limits = { Name=Value, Name2=Value2}.
|
LinkHbStatus (system use only) | Indicates status of private network links on any system. Possible values include the following: LinkHbStatus = { nic1 = UP, nic2 = DOWN } Where the value UP for nic1 means there is at least one peer in the cluster that is visible on nic1. Where the value DOWN for nic2 means no peer in the cluster is visible on nic2.
|
LLTNodeId (system use only) | Displays the node ID defined in the file. /etc/llttab.
|
LoadTimeCounter (system use only) | System-maintained internal counter of how many seconds the system load has been above LoadWarningLevel. This value resets to zero anytime system load drops below the value in LoadWarningLevel. If the cluster-level attribute Statistics is enabled, any change made to LoadTimeCounter does not affect VCS behavior.
|
LoadTimeThreshold (user-defined) | How long the system load must remain at or above LoadWarningLevel before the LoadWarning trigger is fired. If set to 0 overload calculations are disabled. If the cluster-level attribute Statistics is enabled, any change made to LoadTimeThreshold does not affect VCS behavior.
|
LoadWarningLevel (user-defined) | A percentage of total capacity where load has reached a critical limit. If set to 0 overload calculations are disabled. For example, setting LoadWarningLevel = 80 sets the warning level to 80 percent. The value of this attribute can be set from 1 to 100. If set to 1, system load must equal 1 percent of system capacity to begin incrementing the LoadTimeCounter. If set to 100, system load must equal system capacity to increment the LoadTimeCounter. If the cluster-level attribute Statistics is enabled, any change made to LoadWarningLevel does not affect VCS behavior.
|
MemThresholdLevel (user-defined) | Determines the threshold values for memory utilization based on which various levels of logs are generated. The notification levels are Critical, Warning, Note, and Info, and the logs are stored in the file engine_A.log. If the Warning level is crossed, a notification is generated. The values are configurable at a system level in the cluster. For example, the administrator may set the value of MemThresholdLevel as follows:
|
MeterRecord (system use only) | Acts as an internal system attribute with predefined keys. This attribute is updated only when the Cluster attribute AdpativePolicy is set to Enabled.
Possible keys are:
|
NoAutoDisable (system use only) | When set to 0, this attribute autodisables service groups when the VCS engine is taken down. Groups remain autodisabled until the engine is brought up (regular membership). This attribute's value is updated whenever a node joins (gets into RUNNING state) or leaves the cluster. This attribute cannot be set manually.
|
NodeId (system use only) | System (node) identification specified in: /etc/llttab.
|
OnGrpCnt (system use only) | Number of groups that are online, or about to go online, on a system.
|
PhysicalServer (system use only) | Indicates the name of the physical system on which the VM is running when VCS is deployed on a VM.
|
ReservedCapacity (system use only) | Indicates the reserved capacity on the systems for service groups which are coming online and with FailOverPolicy is set to BiggestAvailable. It has all of the keys specified in HostMeters, such as CPU, Mem, and Swap. The values for keys are set in corresponding units as specified in the Cluster attribute MeterUnit.
When the service group completes online transition and after the next forecast cycle, ReservedCapacity is updated. You cannot configure this attribute in main.cf. |
ShutdownTimeout (user-defined) | Determines whether to treat system reboot as a fault for service groups running on the system. On many systems, when a reboot occurs the processes are stopped first, then the system goes down. When the VCS engine is stopped, service groups that include the failed system in their SystemList attributes are autodisabled. However, if the system goes down within the number of seconds designated in ShutdownTimeout, service groups previously online on the failed system are treated as faulted and failed over. Arctera recommends that you set this attribute depending on the average time it takes to shut down the system. If you do not want to treat the system reboot as a fault, set the value for this attribute to 0.
|
SourceFile (user-defined) | File from which the configuration is read. Do not configure this attribute in main.cf. Make sure the path exists on all nodes before running a command that configures this attribute.
|
SupportedProtocol (System use only) | A system-level attribute that displays the protocol numbers supported by the running system node.
|
SwapThresholdLevel (user-defined) | Determines the threshold values for swap space utilization based on which various levels of logs are generated. The notification levels are Critical, Warning, Note, and Info, and the logs are stored in the file engine_A.log. If the Warning level is crossed, a notification is generated. The values are configurable at a system level in the cluster.
|
SysInfo (system use only) | Provides platform-specific information, including the name, version, and release of the operating system, the name of the system on which it is running, and the hardware type.
|
SysName (system use only) | Indicates the system name.
|
SysState (system use only) | Indicates system states, such as RUNNING, FAULTED, EXITED, etc.
|
SystemLocation (user-defined) | Indicates the location of the system.
|
SystemOwner (user-defined) | Use this attribute for VCS email notification and logging. VCS sends email notification to the person designated in this attribute when an event occurs related to the system. Note that while VCS logs most events, not all events trigger notifications. Make sure to set the severity level at which you want notifications to SystemOwner or to at least one recipient defined in the SmtpRecipients attribute of the NotifierMngr agent.
|
SystemRecipients (user-defined) | This attribute is used for VCS email notification. VCS sends email notification to persons designated in this attribute when events related to the system occur and when the event's severity level is equal to or greater than the level specified in the attribute. Make sure to set the severity level at which you want notifications to be sent to SystemRecipients or to at least one recipient defined in the SmtpRecipients attribute of the NotifierMngr agent.
|
TFrozen (user-defined) | Indicates whether a service group can be brought online on a node. Service group cannot be brought online if the value of this attribute is 1.
|
TRSE (system use only) | Indicates in seconds the time to Regular State Exit. Time is calculated as the duration between the events of VCS losing port h membership and of VCS losing port a membership of GAB.
|
UpDownState (system use only) | This attribute has four values: Down (0): System is powered off, or GAB and LLT are not running on the system. Up but not in cluster membership (1): GAB and LLT are running but the VCS engine is not. Up and in jeopardy (2): The system is up and part of cluster membership, but only one network link (LLT) remains. Up (3): The system is up and part of cluster membership, and has at least two links to the cluster.
|
UserInt (user-defined) | Stores integer values you want to use. VCS does not interpret the value of this attribute.
|
VCSFeatures (system use only) | Indicates which VCS features are enabled. Possible values are: 0 - No features enabled 1 - L3+ is enabled 2 - Global Cluster Option is enabled Even though VCSFeatures attribute is an integer attribute, when you query the value with the hasys -value command or the hasys -display command, it displays as the string L10N for value 1 and DR for value 2.
|
More Information