V-16-2-13066 "Agent is calling clean for resource () because the resource is not up even after online completed"

Article: 100006897
Last Published: 2013-08-16
Ratings: 1 1
Product(s): InfoScale & Storage Foundation

Problem

In Veritas Cluster Server (VCS) when the agent starts up a resource, it allows a certain amount of time for the resource to start.  If the online procedure completes and the the resource has not started within the allotted time, then the agent will attempt a clean operation and declare the resource faulted.  
 

Error Message

VCS ERROR V-16-2-13066 (<server>) Agent is calling clean for resource(<resource_name>) because the resource is not up even after online completed.

Cause

In some situations, a cluster resource may fault because the service or process that is being monitored simply has not finished starting.

In order to determine if this scenario applies some investigation should be carried out. For example, if the resource is a service being started such as SQL server, then SQL server will log startup messages in the Windows event logs.  If it can be seen that the service is issued a start command, but it then neither starts or fails before VCS declares it faulted, then this scenario applies.  In the case of SQL server perhaps there is a database recovery that takes place that exceeds the time allowed.  It might also be appropriate to read the SQL server error logs to determine if the SQL server is failing to start.

Solution

In the case where VCS declares the resource faulted before the resource is actually faulted, it may be helpful to increase the OnlineRetryLimit attribute for the agent that monitors the resource. The OnlineRetryLimit defines the number of monitoring intervals that the cluster agent will wait for a resource to start before giving up and declaring a resource to be faulted. For most agents, the default monitoring interval is 60 seconds. Incrementing the OnlineRetryLimit to a value of "2" will cause the agent to allow an additional 120 seconds for a resource to start before declaring a fault.

However, if the resource has a genuine reason for not starting, incorrect cluster tuning using OnlineRetryLimit can delay detection of a legitimate fault.

To increase the OnlineRetryLimit, first click the the agent for the resource type. The agents are distinguishable from the resources by their icons. The agents have a dark blue box with an open lid, while the resources have a light blue box with a closed lid. In order to access the OnlineRetryLimit, it is necessary to select "Show all attributes."

Note: This will not require cycling any services or offlining any resources.

 

 

References

UMI : V-16-2-13066

Was this content helpful?