The best practice to track down the problem from VCS Notifier Agent with going fault.

Problem

 [ ISSUE ]
Notifier Agent failed.

Error Message

[ ERROR MESSAGES ]
2011/01/10 11:34:01 VCS NOTICE V-16-1-10301 Initiating Online of Resource Notifier (Owner: unknown, Group: ClusterService) on System symc-linux1

2011/01/10 11:34:01 VCS INFO V-16-1-10298 Resource Notifier (Owner: unknown, Group: ClusterService) is online on symc-linux1 (VCS initiated)
2011/01/10 11:34:02 VCS INFO V-16-1-10304 Resource Notifier (Owner: unknown, Group: ClusterService) is offline on symc-linux2 (First probe)
2011/01/10 11:35:02 VCS ERROR V-16-2-13067 (symc-linux1) Agent is calling clean for resource(Notifier) because the resource became OFFLINE unexpectedly, on its own.
2011/01/10 11:35:02 VCS INFO V-16-2-13068 (symc-linux1) Resource(Notifier) - clean completed successfully.
2011/01/10 11:35:02 VCS ERROR V-16-2-13073 (symc-linux1) Resource(Notifier) became OFFLINE unexpectedly on its own. Agent is restarting (attempt number 1 of 3) the resource.
2011/01/10 11:35:02 VCS NOTICE V-16-2-13076 (symc-linux1) Agent has successfully restarted resource(Notifier).
2011/01/10 11:36:03 VCS ERROR V-16-2-13067 (symc-linux1) Agent is calling clean for resource(Notifier) because the resource became OFFLINE unexpectedly, on its own.
2011/01/10 11:36:03 VCS INFO V-16-2-13068 (symc-linux1) Resource(Notifier) - clean completed successfully.
2011/01/10 11:36:03 VCS ERROR V-16-2-13073 (symc-linux1) Resource(Notifier) became OFFLINE unexpectedly on its own. Agent is restarting (attempt number 2 of 3) the resource.
2011/01/10 11:36:03 VCS NOTICE V-16-2-13076 (symc-linux1) Agent has successfully restarted resource(Notifier).
2011/01/10 11:37:03 VCS ERROR V-16-2-13067 (symc-linux1) Agent is calling clean for resource(Notifier) because the resource became OFFLINE unexpectedly, on its own.
2011/01/10 11:37:03 VCS INFO V-16-2-13068 (symc-linux1) Resource(Notifier) - clean completed successfully.
2011/01/10 11:37:03 VCS ERROR V-16-2-13073 (symc-linux1) Resource(Notifier) became OFFLINE unexpectedly on its own. Agent is restarting (attempt number 3 of 3) the resource.
2011/01/10 11:37:03 VCS NOTICE V-16-2-13076 (symc-linux1) Agent has successfully restarted resource(Notifier).
2011/01/10 11:38:03 VCS ERROR V-16-2-13067 (symc-linux1) Agent is calling clean for resource(Notifier) because the resource became OFFLINE unexpectedly, on its own.
2011/01/10 11:38:03 VCS INFO V-16-2-13068 (symc-linux1) Resource(Notifier) - clean completed successfully.
2011/01/10 11:38:03 VCS INFO V-16-1-10307 Resource Notifier (Owner: unknown, Group: ClusterService) is offline on symc-linux1 (Not initiated by VCS)
2011/01/10 11:38:03 VCS NOTICE V-16-1-10301 Initiating Online of Resource Notifier (Owner: unknown, Group: ClusterService) on System symc-linux2
2011/01/10 11:38:03 VCS INFO V-16-1-10298 Resource Notifier (Owner: unknown, Group: ClusterService) is online on symc-linux2 (VCS initiated)
2011/01/10 11:39:03 VCS ERROR V-16-2-13067 (symc-linux2) Agent is calling clean for resource(Notifier) because the resource became OFFLINE unexpectedly, on its own.
2011/01/10 11:39:03 VCS INFO V-16-2-13068 (symc-linux2) Resource(Notifier) - clean completed successfully.
2011/01/10 11:39:03 VCS ERROR V-16-2-13073 (symc-linux2) Resource(Notifier) became OFFLINE unexpectedly on its own. Agent is restarting (attempt number 1 of 3) the resource.
2011/01/10 11:39:03 VCS NOTICE V-16-2-13076 (symc-linux2) Agent has successfully restarted resource(Notifier).
2011/01/10 11:40:03 VCS ERROR V-16-2-13067 (symc-linux2) Agent is calling clean for resource(Notifier) because the resource became OFFLINE unexpectedly, on its own.
2011/01/10 11:40:03 VCS INFO V-16-2-13068 (symc-linux2) Resource(Notifier) - clean completed successfully.
2011/01/10 11:40:03 VCS ERROR V-16-2-13073 (symc-linux2) Resource(Notifier) became OFFLINE unexpectedly on its own. Agent is restarting (attempt number 2 of 3) the resource.
2011/01/10 11:40:03 VCS NOTICE V-16-2-13076 (symc-linux2) Agent has successfully restarted resource(Notifier).
2011/01/10 11:41:03 VCS ERROR V-16-2-13067 (symc-linux2) Agent is calling clean for resource(Notifier) because the resource became OFFLINE unexpectedly, on its own.
2011/01/10 11:41:03 VCS INFO V-16-2-13068 (symc-linux2) Resource(Notifier) - clean completed successfully.
2011/01/10 11:41:03 VCS ERROR V-16-2-13073 (symc-linux2) Resource(Notifier) became OFFLINE unexpectedly on its own. Agent is restarting (attempt number 3 of 3) the resource.
2011/01/10 11:41:03 VCS NOTICE V-16-2-13076 (symc-linux2) Agent has successfully restarted resource(Notifier).
2011/01/10 11:42:03 VCS ERROR V-16-2-13067 (symc-linux2) Agent is calling clean for resource(Notifier) because the resource became OFFLINE unexpectedly, on its own.
2011/01/10 11:42:03 VCS INFO V-16-2-13068 (symc-linux2) Resource(Notifier) - clean completed successfully.
2011/01/10 11:42:03 VCS INFO V-16-1-10307 Resource Notifier (Owner: unknown, Group: ClusterService) is offline on symc-linux2 (Not initiated by VCS)

 

[ SUMMARY STATUS OF VCS ]

-- SYSTEM STATE
-- System               State                Frozen             

A  symc-linux1         RUNNING              0                   
A  symc-linux2         RUNNING              0            
       
-- GROUP STATE
-- Group           System               Probed     AutoDisabled    State         
B  ClusterService  symc-linux1         Y          N               OFFLINE|FAULTED      <<<<<<
B  ClusterService  symc-linux2         Y          N               OFFLINE|FAULTED      <<<<<<

B  asic-queueing   symc-linux1         Y          N               ONLINE        
B  asic-queueing   symc-linux2         Y          N               OFFLINE       
B  fop-servlet     symc-linux1         Y          N               ONLINE        
B  fop-servlet     symc-linux2         Y          N               OFFLINE       
B  network         symc-linux1         Y          N               ONLINE        
B  network         symc-linux2         Y          N               ONLINE        
B  nfs-share       symc-linux1         Y          N               OFFLINE       
B  nfs-share       symc-linux2         Y          N               ONLINE        
B  webservices     symc-linux1         Y          N               ONLINE        
B  webservices     symc-linux2         Y          N               OFFLINE       
 
-- RESOURCES FAILED
-- Group           Type                 Resource             System             
C  ClusterService  NotifierMngr         Notifier             symc-linux1       
C  ClusterService  NotifierMngr         Notifier             symc-linux2     
  
-- RESOURCES NOT PROBED
-- Group           Type                 Resource             System             
D  ClusterService  NIC                  csgnic               symc-linux1       
D  ClusterService  NIC                  csgnic               symc-linux2 

Cause

[ CONFIGURATION AND LOGS ]

1) /var/VRTSvcs/log/notifier_A.log
-------------------------------------------------------------------------
2010/12/15 16:24:41 VCS WARNING V-16-1-10630 IpmHandle::send _write_errno is 6. Client (unknown) Pid (-1)
2010/12/15 16:26:14 VCS WARNING V-16-1-10630 IpmHandle::send _write_errno is 6. Client (unknown) Pid (-1)
2010/12/15 16:27:18 VCS WARNING V-16-1-10630 IpmHandle::send _write_errno is 6. Client (unknown) Pid (-1)
2010/12/15 16:28:32 VCS WARNING V-16-1-10630 IpmHandle::send _write_errno is 6. Client (unknown) Pid (-1)
2010/12/15 16:29:56 VCS WARNING V-16-1-10630 IpmHandle::send _write_errno is 6. Client (unknown) Pid (-1)
2010/12/15 16:31:30 VCS WARNING V-16-1-10630 IpmHandle::send _write_errno is 6. Client (unknown) Pid (-1)
2010/12/15 16:36:54 VCS WARNING V-16-1-10630 IpmHandle::send _write_errno is 6. Client (unknown) Pid (-1)
2010/12/16 11:32:31 VCS WARNING V-16-1-10630 IpmHandle::send _write_errno is 6. Client (unknown) Pid (-1)
2010/12/16 11:33:02 VCS WARNING V-16-1-10630 IpmHandle::send _write_errno is 6. Client (unknown) Pid (-1)
2010/12/16 11:33:43 VCS WARNING V-16-1-10630 IpmHandle::send _write_errno is 6. Client (unknown) Pid (-1)
2010/12/16 11:34:34 VCS WARNING V-16-1-10630 IpmHandle::send _write_errno is 6. Client (unknown) Pid (-1)
2010/12/16 11:35:35 VCS WARNING V-16-1-10630 IpmHandle::send _write_errno is 6. Client (unknown) Pid (-1)
2010/12/16 11:36:46 VCS WARNING V-16-1-10630 IpmHandle::send _write_errno is 6. Client (unknown) Pid (-1)
2010/12/16 11:40:50 VCS WARNING V-16-1-10630 IpmHandle::send _write_errno is 6. Client (unknown) Pid (-1)
2010/12/16 11:41:21 VCS WARNING V-16-1-10630 IpmHandle::send _write_errno is 6. Client (unknown) Pid (-1)
2010/12/16 11:42:02 VCS WARNING V-16-1-10630 IpmHandle::send _write_errno is 6. Client (unknown) Pid (-1)
2010/12/16 11:42:53 VCS WARNING V-16-1-10630 IpmHandle::send _write_errno is 6. Client (unknown) Pid (-1)
2010/12/16 11:43:55 VCS WARNING V-16-1-10630 IpmHandle::send _write_errno is 6. Client (unknown) Pid (-1)
2010/12/16 11:45:06 VCS WARNING V-16-1-10630 IpmHandle::send _write_errno is 6. Client (unknown) Pid (-1)
2010/12/16 16:08:30 VCS WARNING V-16-1-10630 IpmHandle::send _write_errno is 6. Client (unknown) Pid (-1)
 

2)  /var/VRTSvcs/log/Notifier_A.log
-------------------------------------------------------------------------
2011/01/10 11:38:03 VCS DBG_AGDEBUG V-16-50-0 Thread(4154198928) name(Notifier) op(1607)
        VCSAgTimer.C:check_timers[297]
2011/01/10 11:38:03 VCS DBG_AGDEBUG V-16-50-0 Thread(4154198928) Resetting periodic timer for resource Notifier op 1607 to expire at 1485   <<<<<< Set the timer
        VCSAgTimer.C:_res
et_periodic_timer[999]
2011/01/10 11:38:03 VCS DBG_AGDEBUG V-16-50-0 Thread(4154198928) Adding timer for Notifier with tmo 1485                                                  <<<<<<
        VCSAgTimer.C:_add[723]
2011/01/10 11:38:03 VCS DBG_AGDEBUG V-16-50-0 Thread(4154198928) Timer id is 28
        VCSAgTimer.C:_add[739]
2011/01/10 11:38:03 VCS DBG_AGDEBUG V-16-50-0 Thread(4154198928) Appending command minor code 1607 for resource Notifier
        VCSAgRes.C:append_cmd[340]
2011/01/10 11:38:03 VCS DBG_AGDEBUG V-16-50-0 Thread(4154198928) Scheduled resource Notifier
        VCSAgSched.C:put_req[173]
2011/01/10 11:38:03 VCS DBG_AGDEBUG V-16-50-0 Thread(4151311248) Picked Res(Notifier) from Scheduler
        VCSAgSched.C:_dequeue[64]
2011/01/10 11:38:03 VCS DBG_AGDEBUG V-16-50-0 Thread(4151311248) Resource (Notifier) received cmd minor code (1607)
        VCSAgRes.C:process_cmd[4727]
2011/01/10 11:38:03 VCS DBG_AGINFO V-16-50-0 Thread(4151311248) Resource Notifier transitioning from Online to Monitoring
        VCSAgRes.C:internal_state[4083]
2011/01/10 11:38:03 VCS DBG_AGINFO V-16-50-0 Thread(4151311248) The values of ArgList attributes are given below
        VCSAgRes.C:call_entry_point[986]
2011/01/10 11:38:03 VCS DBG_AGINFO V-16-50-0 Thread(4151311248) arg[0] is (14141)
        VCSAgRes.C:call_entry_point[991]
2011/01/10 11:38:03 VCS DBG_AGINFO V-16-50-0 Thread(4151311248) arg[1] is (30)
        VCSAgRes.C:call_entry_point[991]
2011/01/10 11:38:03 VCS DBG_AGINFO V-16-50-0 Thread(4151311248) arg[2] is (14144)
        VCSAgRes.C:call_entry_point[991]
2011/01/10 11:38:03 VCS DBG_AGINFO V-16-50-0 Thread(4151311248) arg[3] is (162)
        VCSAgRes.C:call_entry_point[991]
2011/01/10 11:38:03 VCS DBG_AGINFO V-16-50-0 Thread(4151311248) arg[4] is (public)
        VCSAgRes.C:call_entry_point[991]
2011/01/10 11:38:03 VCS DBG_AGINFO V-16-50-0 Thread(4151311248) arg[5] is (2)
        VCSAgRes.C:call_entry_point[991]
2011/01/10 11:38:03 VCS DBG_AGINFO V-16-50-0 Thread(4151311248) arg[6] is (172.16.141.15)
        VCSAgRes.C:call_entry_point[991]
2011/01/10 11:38:03 VCS DBG_AGINFO V-16-50-0 Thread(4151311248) arg[7] is (Warning)
        VCSAgRes.C:call_entry_point[991]
2011/01/10 11:38:03 VCS DBG_AGINFO V-16-50-0 Thread(4151311248) arg[8] is (mailgatensw.ffx.jfh.com.au)
        VCSAgRes.C:call_entry_point[991]
2011/01/10 11:38:03 VCS DBG_AGINFO V-16-50-0 Thread(4151311248) arg[9] is (0)
        VCSAgRes.C:call_entry_point[991]
2011/01/10 11:38:03 VCS DBG_AGINFO V-16-50-0 Thread(4151311248) arg[10] is (10)
        VCSAgRes.C:call_entry_point[991]
2011/01/10 11:38:03 VCS DBG_AGINFO V-16-50-0 Thread(4151311248) arg[11] is ()
        VCSAgRes.C:call_entry_point[991]
2011/01/10 11:38:03 VCS DBG_AGINFO V-16-50-0 Thread(4151311248) arg[12] is ()
        VCSAgRes.C:call_entry_point[991]
2011/01/10 11:38:03 VCS DBG_AGINFO V-16-50-0 Thread(4151311248) arg[13] is (2)
        VCSAgRes.C:call_entry_point[991]
2011/01/10 11:38:03 VCS DBG_AGINFO V-16-50-0 Thread(4151311248) arg[14] is (
admin@symc.com)
        VCSAgRes.C:call_entry_point[991]
2011/01/10 11:38:03 VCS DBG_AGINFO V-16-50-0 Thread(4151311248) arg[15] is (Warning)
        VCSAgRes.C:call_entry_point[991]
2011/01/10 11:38:03 VCS DBG_AGINFO V-16-50-0 Thread(4151311248) No OS encoded ArgList attributes
        VCSAgRes.C:call_entry_point[1028]
2011/01/10 11:38:03 VCS DBG_AGDEBUG V-16-50-0 Thread(4151311248) Adding timer for Notifier with tmo 1485
        VCSAgTimer.C:_add[723]
2011/01/10 11:38:03 VCS DBG_AGDEBUG V-16-50-0 Thread(4151311248) Timer id is 32
        VCSAgTimer.C:_add[739]
2011/01/10 11:38:03 VCS DBG_AGDEBUG V-16-50-0 Thread(4151311248) Calling monitor for resource Notifier
        VCSAgType.C:call_monitor[1268]
2011/01/10 11:38:03 VCS DBG_AGDEBUG V-16-50-0 Thread(4151311248) agent ep version is 1
        VCSAgType.C:_is_script_ep[4948]
2011/01/10 11:38:03 VCS DBG_AGINFO V-16-50-0 Thread(4151311248) Resource(Notifier) - monitor entry point exited with a confidence value 0.                <<<<<<< There was no response within its monitoring timeout.
        VCSAgType.C:call_monitor[1368]
2011/01/10 11:38:03 VCS DBG_AGINFO V-16-50-0 Thread(4151311248) Notifier reported state (Offline) & conf_level (0)                                                      <<<<<<< Then place "offline" flag..
        VCSAgRes.C:call_entry_point[1324]
2011/01/10 11:38:03 VCS DBG_AGDEBUG V-16-50-0 Thread(4151311248) Canceling timer for (Notifier) op(1608)
        VCSAgTimer.C:_cancel[808]
2011/01/10 11:38:03 VCS DBG_AGDEBUG V-16-50-0 Thread(4151311248) Removing thread_id 4151311248
        VCSAgThreadTbl.C:remove[221]
2011/01/10 11:38:03 VCS DBG_AGDEBUG V-16-50-0 Thread(4151311248) Canceling timer for (Notifier) op(1605)
        VCSAgTimer.C:_cancel[808]
2011/01/10 11:38:03 VCS DBG_AGDEBUG V-16-50-0 Thread(4151311248) Canceling timer for (Notifier) op(1621)
        VCSAgTimer.C:_cancel[808]
2011/01/10 11:38:03 VCS DBG_AGDEBUG V-16-50-0 Thread(4151311248) Res(Notifier) - ToleranceCount (1) ToleranceLimit(0)
        VCSAgRes.C:tolerance_limit_reached[5262]
2011/01/10 11:38:03 VCS DBG_AGDEBUG V-16-50-0 Thread(4151311248) ToleranceLimit reached
        VCSAgRes.C:tolerance_limit_
reached[5268]
2011/01/10 11:38:03 VCS DBG_AGDEBUG V-16-50-0 Thread(4151311248) Canceling timer for (Notifier) op(1607)
        VCSAgTimer.C:_cancel[808]
2011/01/10 11:38:03 VCS ERROR V-16-2-13067 Thread(4151311248) Agent is calling clean for resource(Notifier) because the resource became OFFLINE unexpectedly, on its own.
..
<snip>
..
 
[ Comment ] According to the debug logs, the Notifer Agent got "offline". On the contrary, there was no explanation about "REASON"..
 
 
3) Reviewing the configuration of Notifier.
 
## main.cf
group ClusterService (
        SystemList = { symc-linux2 = 0, symc-linux1 = 1 }
        AutoStartList = { symc-linux1 }
        )
 
        NIC csgnic (
                Enabled = 0
                Device @symc-linux2 = bond0
                Device @symc-linux1 = bond0
                )
 
        NotifierMngr Notifier (
                SnmpConsoles = { "192.168.1.123" = Warning }
                SmtpServer = "mailgate.test.symantec.com"
                SmtpRecipients = { "
admin@symc.com" = Warning }
                )
 
        Notifier requires csgnic
 
 
4) According to the logs in /etc/VRTSvcs/conf/config/main.cmd, there were something changed in the past.
 
$ egrep -i SmtpServerVrfyOff main.cmd
hatype -modify NotifierMngr ArgList EngineListeningPort MessagesQueue NotifierListeningPort SnmpdTrapPort SnmpCommunity SnmpConsoles SmtpServer SmtpServerVrfyOff SmtpServerTimeout SmtpReturnPath SmtpFromPath SmtpRecipients
haattr -add NotifierMngr SmtpServerVrfyOff -boolean 0
hares -modify Notifier SmtpServerVrfyOff 0
 
[ Comment ] Need to check the current setting parameter in types.cf

$ egrep -i SmtpServerVrfyOff types.cf
        static str ArgList[] = { EngineListeningPort, MessagesQueue, NotifierListeningPort, SnmpdTrapPort, SnmpCommunity, SnmpConsoles, SmtpServer, SmtpServerVrfyOff, SmtpServerTimeout, SmtpReturnPath, SmtpFromPath, SmtpRecipients }
        boolean SmtpServerVrfyOff = 0
 
[ Comment ] According to Amin Guide,
Set this value to 1 if your mail server does not support SMTP VRFY command.
 If this sets with value to 1, the notifier does not send a SMTP VRFY request to the mail server specified in SmtpServer attribute while sending emails.
 
Type and dimension: boolean-scalar Default: 0
 
So therefore, if this parameter is "SmtpServerVrfyOff = 0", the notifier should send a SMTP VRFY request to the mail server specified in SmtpServer attribute while sending emails accordingly.
As of now, it is a question of verifying if SMTP server supports the VCS notifer service and the SMTP VRFY command.

Solution

 

[ WHAT NEED TO DO ]
 
1) Please peruse the technote below for the sake of tracking down SMTP server eligible for Notifier..
 
 
2) Thus, please try out the following command line;
/opt/VRTSvcs/bin/notifier -s m=north -s m=south,p=2000,l=Error,c=your_company -t m=north,e="abc@your_company.com",l=SevereError
 
In this example, notifier:
- Sends all level SNMP traps to north at the default SNMP port and community value public.
- Sends Error and SevereError traps to south at port 2000 and community value your_company.
- Sends SevereError email messages to north as SMTP server at default port and to email recipient abc@your_company.com.
 
 
3) Thus, it may be required to get the strace output on Notifer.
- have truss of Notifier processes (when resource failed so we can check in truss if it have tried to open the smtp connection)
#strace -f -v -p PID -o notifier_strace__`hostname`_`date '+%d.%m.%y'`.out -s 512
 
 
4) For the last workaround, please check if making "SmtpServerVrfyOff" disable make a difference of not.
#haconf -makerw
#hares -modify ntfr SmtpServerVrfyOff 1
#haconf -dump -makero

Applies To

[ CONFIGURATION ]
- Two nodes in VCS configuration

[ VERSION OF OS/PACKAGE ]
1.
Linux symc-linux1 2.6.18-194.32.1.el5 #1 SMP Mon Dec 20 10:52:42 EST 2010 x86_64 x86_64 x86_64 GNU/Linux
Linux symc-linux2 2.6.18-194.32.1.el5 #1 SMP Mon Dec 20 10:52:42 EST 2010 x86_64 x86_64 x86_64 GNU/Linux
 
2. SFHA5.0MP4 

Terms of use for this information are found in Legal Notices.

Search

Survey

Did this article answer your question or resolve your issue?

No
Yes

Did this article save you the trouble of contacting technical support?

No
Yes

How can we make this article more helpful?

Email Address (Optional)