If a non-critical (Critical = 0) resource in a Veritas Cluster Server (VCS) service group is in FAULTED state when the service group is being brought online, the service group will remain indefinitely in "STARTING|PARTIAL" state.
Once the service group enters this state, users will not be able to switch over the service group to another node, as the group is in a transitional state; further attempts to take the service group offline may result in some resources stuck in W_OFFLINE_REVERSE or W_OFFLINE_PROPAGATE state.
# hagrp -state SG1
If online propagation reaches a non-critical resource in FAULTED state, it will prevent further propagation but will not trigger a service group fault. Consequently, the service group will neither succeed nor fail to become online, thus remaining indefinitely in STARTING|PARTIAL state.
This is an expected behavior (as per the design) of VCS engine prior to version 6.0, but can potentially result in unnecessary interruptions depending on the configuration, especially if the user is unaware of the exact cause.
This issue is being tracked under eTrack ID: 2210717.
Symantec Engineering has modified the behavior of VCS engine (version 6.0 and above) such that it will re-calculate the service group state in case a non-critical resource faults during online process, and remove STARTING flag from the group state if the faulted resource was the last one waiting to go online.
This change is also included in the following patch release for VCS 5.1SP1RP3:
- VCS 5.1SP1RP3HF1
(Please contact Symantec Enterprise Support to obtain this patch.)
- All OS platforms
- Pre-6.0 versions of Veritas Cluster Server (VCS) (up to VCS 5.1SP1RP3, at the time of writing)