VCS HAD daemon does not start and dumps core

Article: 100007695
Last Published: 2013-07-29
Ratings: 0 0
Product(s): InfoScale & Storage Foundation

Problem

VCS HAD daemon does not start and or keeps crashing on HP-UX 11.31 (IA64) systems with PHKL_41700.

Error Message

Syslog file would have below messages logged when HAD daemon crashes.

Nov 28 13:39:27 foo500b Had[17461]: VCS NOTICE V-16-1-10075 Building from remote system
Nov 28 13:39:27 foo500b Had[17461]: VCS ERROR V-16-1-50129 Operation 'MSG_NOTIFIER_NOTIFY' rejected as the node is in REMOTE_BUILD state
Nov 28 13:40:17 foo500b vmunix: GAB INFO V-15-1-20036 Port h gen   6f8124 membership ;1
Nov 28 13:40:17 foo500b vmunix: GAB INFO V-15-1-20038 Port h gen   6f8124 k_jeopardy 0
Nov 28 13:40:17 foo500b vmunix: GAB INFO V-15-1-20040 Port h gen   6f8124    visible 0
Nov 28 13:40:17 foo500b Had[17461]: VCS ERROR V-16-1-10468 Node providing snapshot has left the cluster.  Local node leaving cluster to be restarted
Nov 28 13:40:18 foo500b vmunix: GAB WARNING V-15-1-20161 Port h client process killed, GAB will initiate regmon action syslog after 200 sec
Nov 28 13:40:18 foo500b vmunix: GAB INFO V-15-1-20032 Port h closed
Nov 28 13:40:18 foo500b syslog[15648]: VCS ERROR V-16-1-11103 VCS exited. It will restart
Nov 28 13:40:18 foo500b syslog[15648]: VCS ERROR V-16-1-11104 VCS has faulted 6 times since Mon Nov 28 13:40:18 2011  hashadow will not restart VCS.  Correct the problem and restart

STACK of HAD daemon would be as below (located in /var/VRTSvcs/diag/had/):
 
(0)  0x00000000049209b0  _Z12VCSDumpStackPKc + 0x300 at Platform.C:1874 [/opt/VRTSvcs/bin/had]
(1)  0x0000000004921ee0  VCSAbrtHandler + 0xf0 at Platform.C:2091 [/opt/VRTSvcs/bin/had]
(2)  0xe00000012f05f420  ---- Signal 6 (SIGABRT) delivered ----
(3)  0x60000000c044cc10  _select_sys + 0x30 [/usr/lib/hpux32/libc.so.1]
(4)  0x60000000c0463120  _select + 0xe0 at ../../../../../core/libs/libc/shared_em_32_perf/../core/syscalls/t_select.c:21 [/usr/lib/hpux32/libc.so.1]

(5)  0x00000000048141f0  _ZN9IpmHandle6eventsEP5DListPS1_S1_S2_i + 0xb30 at Ipm.C:537 [/opt/VRTSvcs/bin/had]
(6)  0x0000000004823d50  _ZN9IpmHandle4sendEP5VListi + 0x13d0 at Ipm.C:2461 [/opt/VRTSvcs/bin/had]
(7)  0x0000000004780c40  _ZN6System12process_dumpEPvP6MsgHdr + 0xa90 at System.C:5157 [/opt/VRTSvcs/bin/had]
(8)  0x000000000422a8f0  _Z15process_messagePvP5VListi + 0x1350 at had.C:500 [/opt/VRTSvcs/bin/had]
(9)  0x0000000004247190  _Z4MAINmPPc + 0xeff0 at had.C:3236 [/opt/VRTSvcs/bin/had]
(10) 0x000000000425cfe0  main + 0x40 at had.C:3779 [/opt/VRTSvcs/bin/had]
(11) 0x60000000c00427c0  main_opd_entry + 0x50 [/usr/lib/hpux32/dld.so]

Cause

The select(2) system call changes introduced with PHKL_41700 patch causes issues with timer functionality that HAD daemon uses.

Solution

This issue is fixed by HP via PHKL_41967 patch. We suggest to install this patch, which requires system reboot.

 

Until the patch can be installed, below workaround can be used, that does not require reboot.

1. Enable high resolution timer functionality.

# kctune hires_timeout_enable=1

2. Restart HAD daemon

# hastart


Applies To

HP-UX 11.31 (IA64) with PHKL_41700 installed with VCS 5.1SP1

Was this content helpful?