VCS agent such as Application agent crashes or hangs while allocating or de-allocating memory

Article: 100027872
Last Published: 2012-10-10
Ratings: 0 0
Product(s): InfoScale & Storage Foundation

Problem

VCS (Veritas Cluster Server) agent such as Application crashes or hangs while allocating or de-allocating memory. The agent process core stack shows below information. The agent can be restarted successfully after terminating any stale agent process.

Error Message

*** From pstack of agent core dump:

node85:sh# pstack 02334184.node85.core.7584
core '02334184.node84.core.7584' of 7584: /opt/VRTSvcs/bin/Application/ApplicationAgent -type Application
----------------- lwp# 1 / thread# 1 --------------------
fed9aa68 lwp_park (0, 0, 0)
fed27d00 malloc (400, 1, ea728, 0, fee123ec, fee1c5e0) + 44
ff2437d4 __1cJvcsmalloc6FIpcI_pv_ (400, ff2d2054, 6e, 0, 0, 0) + 54
ff24354c __1cLvcssnprintf6FpcIpkcE_I_ (ffbfc33c, 400, ff2d4fe2, 70bd0, ff2dc1e0, 1da0) + 4c
ff261c3c __1cEFFDCSFFDC_DumpLogToFile6MpnKffdc_buf_t_nK_ffdc_role_b_nM_ffdc_result__ (70ba8, 77a90, 1, 1, ffbfc77c, 1) + a4
ff262264 __1cEFFDCNFFDC_DumpLogs6MnK_ffdc_role_b_nM_ffdc_result__ (70ba8, 8, 1, 6, ffbffeff, fee17c40) + 114
ff22c974 __1cDLogJdump_ffdc6Fib_v_ (8, 1, 0, 0, 0, 0) + 7c
ff26ef34 VCSAbrtHandler (6, 0, 0, 0, 0, 0) + 44
ff1b0e88 vcsag_diag_handler (6, 0, ffbfcec8, 0, 0, 0) + 70
fed9aaf4 __sighndlr (6, 0, ffbfcec8, ff1b0e18, 0, 0) + c
fed8f1a4 call_user_handler (6, 0, 4, 0, fec72a00, ffbfcec8) + 3b8
fed8f38c sigacthandler (6, 0, ffbfcec8, 0, 0, 0) + 60
--- called from signal handler with signal 6 (SIGABRT) ---
fed9aa6c __lwp_park (fec72a00, 0, fee15a60, 0, 1c00, 1d3c) + 14
fed28b68 free (c3e50, 467265, e9898, fed8df78, fee123ec, 80808080) + 1c
ff263578 __1cGMemoryHrelease6Mpvi_v_ (ff333fdc, c3e58, 15, 0, ffbfd150, 0) + 560
ff262910 __1c2k6Fpv_v_ (c3e58, ffbfd2fc, ffbfd27c, ffbfd1fc, ffbfd390, ffbfd27c) + 50
ff276dcc __1cOVCSLockRelease6FpnNVCSLockStruct__v_ (627d0, ffbfd438, 592d0, 0, 64, ff2b9b0e) + 1c
ff1b3ccc __1cNVCSAgNotifierHget_cmd6MppnFVList_ppnJIpmHandle__v_ (627c0, ffbfd4fc, ffbfd4f0, 67e, 0, bccd0) + 3dc
ff1b1170 main_loop (5e478, ff2b9371, ff2b981d, 84f, 0, ff2b9829) + 58
ff1b2978 VCSAgMain (3, ffbffc24, 0, 0, 0, 0) + 1330
ff1b29b0 main (3, ffbffc24, ffbffc34, 35800, fe8b0040, 0) + 18
000136c8 _start (0, 0, 0, 0, 0, 0) + 108
[...]


*** From agent log (Application_A.log):


2012/09/23 10:24:23 VCS WARNING V-16-1-10023 Agent Application not sending alive messages since Sun Sep 23 10:22:11 2012
2012/09/23 10:24:23 VCS WARNING V-16-1-53025 Agent Application has faulted; ipm connection was lost; restarting the agent


*** From VCS engine log (engine_A.log):

2012/09/23 10:22:11 VCS ERROR V-16-2-13051 (node85) Agent(Application) is exiting because another agent with process-id(3002) is already running for this type
2012/09/23 10:24:23 VCS WARNING V-16-1-10023 Agent Application not sending alive messages since Sun Sep 23 10:22:11 2012


Cause

The issue is caused due to a code defect tracked via e2407755 / e2245069 in the releases mentioned above.

Any kind of memory allocation done between the fork and execve system calls results in memory corruption followed by agent crash.

Solution

Veritas has modified the agent framework library which does not do any memory allocation operations between fork and execve system calls (in child context). This prevents memory corruption and agent crash. The async-signal-safe function from signal handler is also removed to avoid agent hang during signal handling when memory corruption happens.

This issue is fixed from VCS 5.1SP1RP2 onwards. The latest available patch when this article was written is 5.1SP1RP3.

The patches can be obtained from SORT website:

https://sort.Veritas.com/patch/finder

 


Applies To

Applies to all platforms - AIX, HP-UX, Linux and Solaris

VCS 5.1, 5.1RP1, 5.1SP1, 5.1SP1RP1

References

Etrack : 2407755 / 2245069

Was this content helpful?