Data corruption using VxVM on Linux when devices are added or removed

Article: 100031720
Last Published: 2016-01-04
Ratings: 0 0
Product(s): InfoScale & Storage Foundation

Problem

Data corruption using VxVM on Linux when devices are added or removed is discovered through etrack e3851126. This issue only effects VxVM version 6.1 or earlier on Linux platforms.

Error Message

The data corruption is silent and no error message is printed. The data corruption is only detected later, when the data written is re-read or verified.

At the time the device is added or removed, udev threads which notify the device add/remove events to all the registered processes may get killed or timed out. This might cause a window where data corruption is possible:
 
systemd-udevd: worker [ThreadId]
/devices/pci0000:00/0000:00:06.0/0000:05:00.0/host7/rport-7:0-2/target7:0:1/7:0:1:6/block/sdaj
timeout; kill it
systemd-udevd: seq 21010
'/devices/pci0000:00/0000:00:06.0/0000:05:00.0/host7/rport-7:0-2/target7:0:1/7:0:1:6/block/sdaj'
killed

Also, dmp may log messages indicating a serial mismatch has occurred, this indicates the OS devices have changed without the correct notification to VxVM.

VxVM vxdmp V-5-3-1970 dmp_verify_devid:Graceful DR steps are not followed by the user on the path <DeviceMajor/DeviceMinor>. The device with old serial number <SerialNo1> is replaced with a new device with serial number <SerialNo2> 

Cause

In Linux, VxVM uses the udev framework to listen to disk add/remove events. ESD (Event source daemon) registers a rule with the udev framework in order to  update the VxVM and DMP configuration based on these events.

As the udev-rules are executed serially, the processing of the udev rule for VxVM may get delayed, in turn device add or remove events would be delivered late to ESD.

For instance, after a disk is removed, if the udev-remove event gets delayed, then the path may not be disabled within  the DMP kernel configuration immediately; In addition, udev threads may be timed out or killed.

Under both situations, IO might be issued to an incorrect disk, which may result in data corruption.

 

Solution

VxVM version 6.2 and above does not use the udev rule framework to be notified of changes in devices at the OS layer. Instead OS functionality made available via the libudev.so library is utilised in order to directly listen to changes as udev makes them. This eliminates the possibility of a delay in becoming notified of any changes.

This functionality change has been backported to VxVM 6.1 and will be available in fix 6.1.1.400 due for release at the end of March 2016. The patch can be downloaded from the SORT website once available.
https://sort.veritas.com/land

A HotFix for RHEL 6 is also available 6.1.1.108. Please request the hotfix from Support until the public patch is available.

As RHEL 5 does not support the libudev.so library, which is required for this fix, the fix is not available on RHEL 5. In order to obtain this functionality improvement, an upgrade to RHEL 6 would first be required. Then either apply the HotFix for VxVM 6.1 or upgrade to VxVM 6.2 or higher.

Was this content helpful?