Troubleshooting faulted cluster resources

Article: 100010472
Last Published: 2013-09-11
Ratings: 2 0
Product(s): InfoScale & Storage Foundation

Problem

Troubleshooting faulted cluster resources

Solution

 

Table of contents

Introduction
Location and naming of the VCS logs
The engine log (engine_a.log)
Individual agent logs
Common causes of resource faults




Introduction

(Back to top)


When a cluster resource faults, this event is recorded in the VCS (Cluster Server) engine log and the individual agent log for that resource type. These logs can be used to determine the reason why a fault occured.





Location and naming of the VCS logs

(Back to top)


The VCS engine and agent logs are normally found under /var/VRTSvcs/log.

The name of the engine log is "engine_A.log."  Each agent log is named for its resource type. As an example, the agent log for DiskGroup agent is "DiskGroup_A.txt." For the IP agent, the log file is "IP_A.log." 

Figure 1 has a typical directory listing shat shows the engine log, as well as several agent logs.



Figure 1 - A directory listing that shows several agent logs, in addition to the main engine_A.log


# pwd
/var/VRTSvcs/log

# ls
CmdServer-log_A.log  engine_A.log        HostMonitor_A.log  Mount_A.log        vxfen
CoordPoint_A.log     hashadow-err_A.log  imfd_A.log         SambaServer_A.log
DiskGroup_A.log      hastart.log         IP_A.log           tmp
 
 

 




The engine log (engine_a.log)

(Back to top)

The main event log for VCS is the engine log (engine_a.log). This is usually the best place to begin troubleshooting why a resource faulted. Recent events are appended to the bottom.

Note: The default location for the engine log is /var/VRTSvcs/log/engine_a.log.

Most faults can be found by searching the engine log for the word "clean" (Figure 2). When a resource faults, the resource agent will attempt to clear any lingering remnants of the faulted resource by using a procedure known as a "clean entry point." When this happens, messages containg the phrase "Agent is calling clean for resource(resource name)" are recorded in the engine log (Figure 2). Because of this, the word "clean" is a useful string to quickly scan the engine log.



Figure 2 - Searching the engine log for the word "clean" reveals a faulted resource event


2013/09/03 15:49:22 VCS ERROR V-16-2-13067 (server101) Agent is calling  clean  for resource(Test_vol) because the resource became OFFLINE unexpectedly, on its own.
2013/09/03 15:49:23 VCS INFO V-16-2-13068 (server101) Resource(vol1_MNT) -  clean  completed successfully.
2013/09/03 15:52:31 VCS ERROR V-16-2-13066 (server102) Agent is calling  clean  for resource(locks_MNT) because the resource is not up even after online completed.
2013/09/03 15:52:32 VCS INFO V-16-2-13068 (server102) Resource(locks_MNT) -  clean  completed successfully
  .
 


 



Individual agent logs

(Back to top)


In addition to the main engine log, most cluster resource types have their own logs. These often contain information that is not found in the engine log. Check the agent logs to find more detailed information about a fault (Figure 3). Use the same timestamps that are reported in the engine log to cross-reference the two logs.

Figure 3 shows an excerpt from Mount_A.log, a log that is created by the agent that controls the Mount resources. Notice that when the Mount resource faults, it adds messages that contain the word "clean." In this case, the error reports is that "the resource became OFFLINE unexpectedly, on its own." Possible causes of this particular error message are addressed in https://www.veritas.com/docs/000016134.

Note: Some agents do not have separate logs and simply write their events to the main engine log.



Figure 3 - An example of the agent log for the Mount resource type


#
# Log Name:     Mount
# System:       server101
# SysInfo:      Linux:server101,#1 SMP Wed Jun 13 18:24:36 EDT 2012,2.6.32-279.el6.x86
_64,x86_64
# Created:      2013/01/03 06:43:34
#

2013/01/03 06:43:34 VCS INFO V-16-10031-20507 Mount:Mount:imf_init:successfully initia
lized the VxAMF Mount Module
2013/01/03 06:43:34 VCS INFO V-16-2-13805 Thread(4152198864) (imf_init) entry point co
mpleted with return status (0)
2013/01/03 07:01:14 VCS NOTICE V-16-10031-20704 Mount:Mount:imf_getnotification:Receiv
ed notification for vxamf-group locks_MNT
2013/01/03 07:01:27 VCS NOTICE V-16-10031-20704 Mount:Mount:imf_getnotification:Receiv
ed notification for vxamf-group vol1_MNT
2013/01/03 07:01:50 VCS NOTICE V-16-10031-20704 Mount:Mount:imf_getnotification:Receiv
ed notification for vxamf-group public_MNT
2013/01/03 07:08:15 VCS NOTICE V-16-10031-20704 Mount:Mount:imf_getnotification:Receiv
ed notification for vxamf-group public_MNT
2013/01/03 09:39:13 VCS NOTICE V-16-10031-20704 Mount:Mount:imf_getnotification:Receiv
ed notification for vxamf-group locks_MNT
2013/01/03 09:39:25 VCS NOTICE V-16-10031-20704 Mount:Mount:imf_getnotification:Receiv
ed notification for vxamf-group vol1_MNT
2013/09/03 12:40:27 VCS INFO V-16-10031-20507 Mount:Mount:imf_init:successfully initia
lized the VxAMF Mount Module
2013/09/03 12:40:27 VCS INFO V-16-2-13805 Thread(4151568080) (imf_init) entry point co
mpleted with return status (0)
2013/09/03 15:49:22 VCS NOTICE V-16-10031-20704 Mount:Mount:imf_getnotification:Receiv
ed notification for vxamf-group vol1_MNT
2013/09/03 15:49:22 VCS ERROR V-16-2-13067 Thread(4143958896) Agent is calling  clean  for resource(Test_Vol) because the resource became OFFLINE unexpectedly, on its own.
2013/09/03 15:49:23 VCS ERROR V-16-2-13068 Thread(4143958896) Resource(Test_Vol) -  clean  completed successfully.
2013/09/03 15:49:38 VCS NOTICE V-16-10031-20704 Mount:Mount:imf_getnotification:Received notification for vxamf-group locks_MNT

 


 



Common causes of resource faults

(Back to top)



Table1 - Common causes of resource faults

Error Code Abstract Article link
V-16-2-13067 VCS ERROR V-16-2-13067 (hostname) Agent is calling clean for resource(resource_name) because the resource became OFFLINE unexpectedly, on its own. https://www.veritas.com/support/en_US/article.100008146
V-16-2-13073 (hostname) Resource(resource_name) became OFFLINE unexpectedly on its own. Agent is restarting (attempt number n of n) the resource. https://www.veritas.com/support/en_US/article.100008146
V-16-2-13027 (hostname) Resource(resource_name) - monitor procedure did not complete within the expected time. https://isearch.veritas.com/internal-search/en_US/article.100001408.html

 

Was this content helpful?