Problem
Troubleshooting faulted cluster resources
Solution
Table of contents
Introduction
Location and naming of the VCS logs
The engine log (engine_a.log)
Individual agent logs
Common causes of resource faults
Introduction
When a cluster resource faults, this event is recorded in the VCS (Cluster Server) engine log and the individual agent log for that resource type. These logs can be used to determine the reason why a fault occured.
Location and naming of the VCS logs
The VCS engine and agent logs are normally found under /var/VRTSvcs/log.
The name of the engine log is "engine_A.log." Each agent log is named for its resource type. As an example, the agent log for DiskGroup agent is "DiskGroup_A.txt." For the IP agent, the log file is "IP_A.log."
Figure 1 has a typical directory listing shat shows the engine log, as well as several agent logs.
Figure 1 - A directory listing that shows several agent logs, in addition to the main engine_A.log
# pwd |
The engine log (engine_a.log)
(Back to top)
The main event log for VCS is the engine log (engine_a.log). This is usually the best place to begin troubleshooting why a resource faulted. Recent events are appended to the bottom.
Note: The default location for the engine log is /var/VRTSvcs/log/engine_a.log.
Most faults can be found by searching the engine log for the word "clean" (Figure 2). When a resource faults, the resource agent will attempt to clear any lingering remnants of the faulted resource by using a procedure known as a "clean entry point." When this happens, messages containg the phrase "Agent is calling clean for resource(resource name)" are recorded in the engine log (Figure 2). Because of this, the word "clean" is a useful string to quickly scan the engine log.
Figure 2 - Searching the engine log for the word "clean" reveals a faulted resource event
2013/09/03 15:49:22 VCS ERROR V-16-2-13067 (server101) Agent is calling clean for resource(Test_vol) because the resource became OFFLINE unexpectedly, on its own. . |
Individual agent logs
In addition to the main engine log, most cluster resource types have their own logs. These often contain information that is not found in the engine log. Check the agent logs to find more detailed information about a fault (Figure 3). Use the same timestamps that are reported in the engine log to cross-reference the two logs.
Figure 3 shows an excerpt from Mount_A.log, a log that is created by the agent that controls the Mount resources. Notice that when the Mount resource faults, it adds messages that contain the word "clean." In this case, the error reports is that "the resource became OFFLINE unexpectedly, on its own." Possible causes of this particular error message are addressed in https://www.veritas.com/docs/000016134.
Note: Some agents do not have separate logs and simply write their events to the main engine log.
Figure 3 - An example of the agent log for the Mount resource type
# |
Common causes of resource faults
Table1 - Common causes of resource faults
Error Code | Abstract | Article link |
V-16-2-13067 | VCS ERROR V-16-2-13067 (hostname) Agent is calling clean for resource(resource_name) because the resource became OFFLINE unexpectedly, on its own. | https://www.veritas.com/support/en_US/article.100008146 |
V-16-2-13073 | (hostname) Resource(resource_name) became OFFLINE unexpectedly on its own. Agent is restarting (attempt number n of n) the resource. | https://www.veritas.com/support/en_US/article.100008146 |
V-16-2-13027 | (hostname) Resource(resource_name) - monitor procedure did not complete within the expected time. | https://isearch.veritas.com/internal-search/en_US/article.100001408.html |