When device discovery commands (vxdctl enable or vxdisk scandisks) are run during BCV operations, SAN migration and the like, vxconfigd queries the operating system device tree using HP libIO library (io_init, io_search, io_end etc). Occasionally io_search() returns a NULL string. With VxVM 5.0.1 or higher, DDL/DMP thinks that all the devices have disappeared and removes all the arrays and DMP devices (a.k.a. dmpnode’s) that are visible to the host. This leads to VxVM I/O errors and file systems getting disabled. In cases where VxVM manages the root disk(s), a system hang would result. In a VCS environment, this could trigger monitor timeouts and a possible service group fault. In a HP Serviceguard/SGeRAC environment integrated with CVM and/or CFS, the VxVM I/O failures would typically lead to a Serviceguard INIT and/or a CRS TOC (if the voting disks sit on VxVM volumes).
For VxVM 5.0.1, /etc/vx/dmpevents.log will show the following messages, indicating the removal of dmpnodes and arrays:
Tue Aug 21 02:13:09.000: Reconfiguration is in progress
Tue Aug 21 02:13:09.000: Reconfiguration has finished
Tue Aug 21 02:13:07.677: Removed Dmpnode 3/16
Tue Aug 21 02:13:07.677: Removed Dmpnode 3/32
Tue Aug 21 02:13:07.677: Removed Dmpnode 3/80
Tue Aug 21 02:13:07.677: Disabled Disk array emc0
Tue Aug 21 02:13:07.677: Disabled Disk array p2000g3_fc1
Tue Aug 21 02:13:07.687: Disabled Disk array emc1
Tue Aug 21 02:13:07.687: Disabled Disk array p2000g3_fc0
Tue Aug 21 02:13:07.787: I/O error occured (errno=0x6) on Dmpnode 3/2656
Tue Aug 21 02:13:07.787: I/O error occured (errno=0x6) on Dmpnode 3/1776
For VxVM 5.1SP1, /var/adm/vx/ddl.log*, the dmpnode removal is shown with the following messages:
START TIME = Wed Aug 8 14:30:51 2012
ddl_reconfigure_all: Start of original tree
------------ Start of dmp tree -------------
Found 2620 paths in the dmp tree
------------ End of dmp tree -------------
ddl_reconfigure_all: End of original tree
ddl_reconfigure_all: Start of temporary tree
ddl_reconfigure_all: End of temporary tree
Printing tree after migration is done
---- Start of DMP instruction buffer ----
0x3000010 dmpnode is to be destroyed/freed
0x3000d30 dmpnode is to be destroyed/freed
0x3000d40 dmpnode is to be is to be destroyed/freed
As part of device discovery, vxconfigd queries the OS device tree using HP libIO library functions. The libIO library uses the /dev/config driver to access information in the kernel I/O data structures. From the man page of libIO,
io_init() Opens the /dev/config device special file, which causes an open(2) of the dev_config driver. io_init() must be called before calling any other routine in the libIO library.
io_end() Causes a close(2) of the dev_config driver. io_end() must be called after the use of the libIO library routine(s).
Starting with VxVM versions 5.0+ (namely 5.0, 5.0.1 and 5.1SP1 where multithreaded vxconfigd started using libIO APIs), the use of the thread-unsafe version of the libIO APIs could result in possible race condition in opening and closing /dev/config driver. Specifically, /dev/config opened by one thread (say T1) in an earlier call to io_init() may be closed inadvertently by another thread (say T2) doing io_end() at the same time. When thread T1 proceeds to issue an io_search() after T2 has closed the /dev/config file, the io_search() call would return NULL (a failure) with io_errno set to IO_E_DCONF_OPEN. vxconfigd doesn't detect this error as an io_search() failure but assumes that a NULL returned would mean an empty IO tree.
Even though, the possibility of this race condition exists in 5.0, the dmpnodes are not removed when the issue is hit. In versions 5.0.1 and above, DMP proceeds with deleting all the dmpnode’s and arrays.
Multi-threaded vxconfigd not using thread safe libIOmt library of HP is identified as the root cause for this bug. In other words, this issue has been identified to be with vxconfigd using a non-thread-safe version of the libIO APIs, including io_init(), io_end() and io_search() in a multithreaded context. HP and Symantec are working in collaboration to resolve the issues on priority with best possible solution.
Final resolution for VxVM 5.0.1 and VxVM 5.1SP1 is now released at both Symantec and HP.
Here are the 4 main components of the fix – ALL components need to be installed for the complete solution.
1. VxVM patches :
For release 5.1SP1 :
PHCO_43824 - 11.31 VRTS 5.1 SP1RP3P1 VRTSvxvm Command Patch and
PHKL_43779 - 11.31 VRTS 5.1 SP1RP3P1 VRTSvxvm Kernel Patch
For release 5.0.1 :
PHCO_43579 - 11.31 VRTS 5.0.1 RP3P5 VRTSvxvm Command Patch
PHKL_43580 - 11.31 VRTS 5.0.1 RP3P5 VRTSvxvm Kernel Patch
2. VRTSaslapm 188.8.131.52 or above
Note: this applies only to 5.1 SP1
Please note that in 5.1SP1 environments where Clariion arrays are present it is extremely important that the latest VRTSaslapm package is installed, especially when the 5.1SP1RP3 and 5.1SP1RP3P1 VxVM patches are installed. This is to avoid any unnecesssary vxconfigd coredumps, DPCA situations and/or vxconfigd going into a non-running state as a result of the ATYPE changes implemented in the VxVM patches and VRTSaslapm package. By not upgrading to the latest VRTSaslapm package with these version/patchlevels, an incompatibility could arise leading to the aforementioned situations. As VxVM 5.0.1 ships with an embedded VRTSaslapm, this concern regarding Clariion arrays will not arise on this version.
HP Components :
3. Thread-safe libIO(3X) APIs
This library, namely libIOmt, was introduced with HP patch PHCO_38066 .
4. Thread-safe SNIA libraries for I/O drivers on HP-UX 11.31
These drivers are required by vxesd(1M) and have been delivered with the latest I/O driver bundles in November 2013. The GVSD driver for HPVM guests will be shipped in the March 2014 HP-UX Release Update.
This component is required (but not enforced) for 5.0.1 as well.
The HP components are available for download at itrc.hp.com by following these quick links :
NOTE: The final fix for VxVM 6.x releases is available in 6.0.5. If you are not running VxVM 6.0.5 please continue to use the workaround of running vxconfigd in single threaded mode
to avoid the race condition.
*** End update 9/17/2014 ***
** Update 10/16/2013 **
Symantec has now released a remedial fix for this issue in version 5.1SP1RP3 via the following patches:
Customers running 5.0.1 or 6.0.x (ie not 6.0.5) must still continue to use the “nothreads” workaround to avoid the outage.
** End update 10/16/2013
Original content :
It has been confirmed that running vxconfigd with multithreading disabled will effectively avoid the issue in a standalone VxVM configuration. In addition, recent testing has revealed that there should be little or no performance impact with multithreading disabled. A hotfix is available for 5.0.1 and 5.1SP1 releases. For version 6.0+, a workaround is available and the final fix is available in 6.0.5. Please contact Symantec Technical Services for the workaround or the hotfix patch.
VxVM 5.0.1 and higher versions.