InfoScale™ 9.0 Cluster Server Administrator's Guide - Windows
- Section I. Clustering concepts and terminology
- Introducing Cluster Server
- About Cluster Server
- About cluster control guidelines
- About the physical components of VCS
- Logical components of VCS
- About resources and resource dependencies
- Categories of resources
- About resource types
- About service groups
- Types of service groups
- About the ClusterService group
- About agents in VCS
- About agent functions
- Agent classifications
- VCS agent framework
- About cluster control, communications, and membership
- About security services
- Components for administering VCS
- Putting the pieces together
- About cluster topologies
- VCS configuration concepts
- Introducing Cluster Server
- Section II. Administration - Putting VCS to work
- About the VCS user privilege model
- Getting started with VCS
- Administering the cluster from the command line
- About administering VCS from the command line
- Starting VCS
- Stopping the VCS engine and related processes
- About managing VCS configuration files
- About managing VCS users from the command line
- About querying VCS
- About administering service groups
- Adding and deleting service groups
- Modifying service group attributes
- Bringing service groups online
- Taking service groups offline
- Switching service groups
- Freezing and unfreezing service groups
- Enabling and disabling priority based failover for a service group
- Enabling and disabling service groups
- Clearing faulted resources in a service group
- Linking and unlinking service groups
- Administering agents
- About administering resources
- About administering resource types
- Administering systems
- About administering clusters
- Using the -wait option in scripts that use VCS commands
- Configuring resources and applications in VCS
- About configuring resources and applications
- About Virtual Business Services
- About Intelligent Resource Monitoring (IMF)
- About fast failover
- How VCS monitors storage components
- Shared storage - if you use NetApp filers
- Shared storage - if you use SFW to manage cluster dynamic disk groups
- Shared storage - if you use Windows LDM to manage shared disks
- Non-shared storage - if you use SFW to manage dynamic disk groups
- Non-shared storage - if you use Windows LDM to manage local disks
- Non-shared storage - if you use VMware storage
- About storage configuration
- About configuring network resources
- About configuring file shares
- Before you configure a file share service group
- Configuring file shares using the wizard
- Modifying a file share service group using the wizard
- Deleting a file share service group using the wizard
- Creating non-scoped file shares configured with VCS
- Making non-scoped file shares accessible while using virtual server name or IP address if NetBIOS and WINS are disabled
- About configuring IIS sites
- About configuring services
- About configuring a service using the GenericService agent
- Before you configure a service using the GenericService agent
- Configuring a service using the GenericService agent
- About configuring a service using the ServiceMonitor agent
- Before you configure a service using the ServiceMonitor agent
- Configuring a service using the ServiceMonitor agent
- About configuring processes
- About configuring Microsoft Message Queuing (MSMQ)
- Before you configure the MSMQ service group
- Configuring the MSMQ resource using the command-line utility
- Configuring the MSMQ service group using the wizard
- Modifying an MSMQ service group using the wizard
- Configuring MSMQ agent to check port bindings more than once
- Binding an MSMQ instance to the correct IP address
- Checking whether MSMQ is listening for messages
- About configuring the infrastructure and support agents
- About configuring applications using the Application Configuration Wizard
- Before you configure service groups using the Application Configuration wizard
- Adding resources to a service group
- Configuring service groups using the Application Configuration Wizard
- Modifying an application service group
- Deleting resources from a service group
- Deleting an application service group
- About application monitoring on single-node clusters
- Configuring the service group in a non-shared storage environment
- About the VCS Application Manager utility
- About testing resource failover using virtual fire drills
- Modifying the cluster configuration
- Section III. Administration - Beyond the basics
- Controlling VCS behavior
- VCS behavior on resource faults
- About controlling VCS behavior at the service group level
- About the AutoRestart attribute
- About controlling failover on service group or system faults
- About defining failover policies
- About system zones
- Load-based autostart
- About freezing service groups
- About controlling Clean behavior on resource faults
- Clearing resources in the ADMIN_WAIT state
- About controlling fault propagation
- Customized behavior diagrams
- VCS behavior for resources that support the intentional offline functionality
- About controlling VCS behavior at the resource level
- Changing agent file paths and binaries
- Service group workload management
- Sample configurations depicting workload management
- The role of service group dependencies
- VCS event notification
- VCS event triggers
- About VCS event triggers
- Using event triggers
- List of event triggers
- About the dumptunables trigger
- About the injeopardy event trigger
- About the loadwarning event trigger
- About the nofailover event trigger
- About the postoffline event trigger
- About the postonline event trigger
- About the preonline event trigger
- About the resadminwait event trigger
- About the resfault event trigger
- About the resnotoff event trigger
- About the resrestart event trigger
- About the resstatechange event trigger
- About the sysoffline event trigger
- About the unable_to_restart_agent event trigger
- About the unable_to_restart_had event trigger
- About the violation event trigger
- Controlling VCS behavior
- Section IV. Cluster configurations for disaster recovery
- Connecting clusters–Creating global clusters
- How VCS global clusters work
- VCS global clusters: The building blocks
- Visualization of remote cluster objects
- About global service groups
- About global cluster management
- About serialization - The Authority attribute
- About resiliency and "Right of way"
- VCS agents to manage wide-area failover
- About the Steward process: Split-brain in two-cluster global clusters
- Secure communication in global clusters
- Prerequisites for global clusters
- Setting up a global cluster
- Preparing the application for the global environment
- Configuring the ClusterService group
- Configuring replication resources in VCS
- Linking the application and replication service groups
- Configuring the second cluster
- Linking clusters
- Configuring the Steward process (optional)
- Stopping the Steward process
- Configuring the global service group
- About IPv6 support with global clusters
- About cluster faults
- About setting up a disaster recovery fire drill
- Multi-tiered application support using the RemoteGroup agent in a global environment
- Test scenario for a multi-tiered environment
- Administering global clusters from Cluster Manager (Java console)
- Administering global clusters from the command line
- About administering global clusters from the command line
- About global querying in a global cluster setup
- Administering global service groups in a global cluster setup
- Administering resources in a global cluster setup
- Administering clusters in global cluster setup
- Administering heartbeats in a global cluster setup
- Setting up replicated data clusters
- Connecting clusters–Creating global clusters
- Section V. Troubleshooting and performance
- VCS performance considerations
- How cluster components affect performance
- How cluster operations affect performance
- VCS performance consideration when booting a cluster system
- VCS performance consideration when a resource comes online
- VCS performance consideration when a resource goes offline
- VCS performance consideration when a service group comes online
- VCS performance consideration when a service group goes offline
- VCS performance consideration when a resource fails
- VCS performance consideration when a system fails
- VCS performance consideration when a network link fails
- VCS performance consideration when a system panics
- VCS performance consideration when a service group switches over
- VCS performance consideration when a service group fails over
- Monitoring CPU usage
- VCS agent statistics
- About VCS performance with non-HA products
- About VCS performance with SFW
- Troubleshooting and recovery for VCS
- VCS message logging
- Handling network failure
- Troubleshooting VCS startup
- Troubleshooting secure clusters
- Troubleshooting service groups
- Troubleshooting resources
- Troubleshooting notification
- Troubleshooting and recovery for global clusters
- Troubleshooting the steward process
- VCS utilities
- VCS performance considerations
- Section VI. Appendixes
- Appendix A. VCS user privileges—administration matrices
- Appendix B. Cluster and system states
- Appendix C. VCS attributes
- Appendix D. Configuring LLT over UDP
- Appendix E. Handling concurrency violation in any-to-any configurations
- Appendix F. Accessibility and VCS
- Appendix G. InfoScale event logging
Troubleshooting service groups
This topic cites the most common problems associated with bringing service groups online and taking them offline. Recommended action is also included, where applicable.
System is not in RUNNING state.
Recommended action: Type hasys -display system to verify the system is running.
For more information on system states:
Service group not configured to run on the system.
The SystemList attribute of the group may not contain the name of the system.
Recommended action: Use the output of the command hagrp -display service_group to verify the system name.
Service group not configured to autostart.
If the service group is not starting automatically on the system, the group may not be configured to AutoStart, or may not be configured to AutoStart on that particular system.
Recommended action: Use the output of the command hagrp -display service_group to verify the values of the AutoStart and AutoStartList attributes.
Service group is frozen.
Recommended action: Use the output of the command hagrp -display service_group to verify the value of the Frozen and TFrozen attributes. Use the command hagrp -unfreeze to unfreeze the group. Note that VCS will not take a frozen service group offline.
Service group autodisabled.
When VCS does not know the status of a service group on a particular system, it autodisables the service group on that system. Autodisabling occurs under the following conditions:
When the VCS engine, HAD, is not running on the system.
When all resources within the service group are not probed on the system.
When a particular system is visible through disk heartbeat only.
Under these conditions, all service groups that include the system in their SystemList attribute are autodisabled. This does not apply to systems that are powered off.
Recommended action: Use the output of the command hagrp -display service_group to verify the value of the AutoDisabled attribute.
Warning:
To bring a group online manually after VCS has autodisabled the group, make sure that the group is not fully or partially active on any system that has the AutoDisabled attribute set to 1 by VCS. Specifically, verify that all resources that may be corrupted by being active on multiple systems are brought down on the designated systems. Then, clear the AutoDisabled attribute for each system:
C:\> hagrp -autoenable service_group -sys system
Failover service group is online on another system.
The group is a failover group and is online or partially online on another system.
Recommended action: Use the output of the command hagrp -display service_group to verify the value of the State attribute. Use the command hagrp -offline to offline the group on another system.
Service group is waiting for the resource to be brought online/taken offline.
Recommended action: Review the IState attribute of all resources in the service group to locate which resource is waiting to go online (or which is waiting to be taken offline). Use the hastatus command to help identify the resource. See the engine and agent logs for information on why the resource is unable to be brought online or be taken offline.
To clear this state, make sure all resources waiting to go online/offline do not bring themselves online/offline. Use the command hagrp -flush to clear the internal state of VCS. You can now bring the service group online or take it offline on another system.
A critical resource faulted.
Output of the command hagrp -display service_group indicates that the service group has faulted.
Recommended action: Use the command hares -clear to clear the fault.
Service group is waiting for a dependency to be met.
Recommended action: To see which dependencies have not been met, type hagrp -dep service_group to view service group dependencies, or hares -dep resource to view resource dependencies.
Service group not fully probed.
This occurs if the agent processes have not monitored each resource in the service group. When the VCS engine, HAD, starts, it immediately "probes" to find the initial state of all of resources. (It cannot probe if the agent is not returning a value.) A service group must be probed on all systems included in the SystemList attribute before VCS attempts to bring the group online as part of AutoStart. This ensures that even if the service group was online prior to VCS being brought up, VCS will not inadvertently bring the service group online on another system.
Recommended action: Use the output of hagrp -display service_group to see the value of the ProbesPending attribute for the system's service group. (It should be zero.) To determine which resources are not probed, verify the local Probed attribute for each resource on the specified system. Zero means waiting for probe result, 1 means probed, and 2 means VCS not booted. See the engine and agent logs for information.
More Information