VxVM: Volume Manager (VxVM) commands may hang or delay server reboots when LUNs (ShadowImage) are in a not-ready (NR) state

Article: 100047115
Last Published: 2020-03-11
Ratings: 1 1
Product(s): InfoScale & Storage Foundation

Problem


Where the array vendor is unable to expose the LUN status to Veritas DMP (multi-pathing) and the LUN is in a not-ready (NR) state, VxVM can either run very slowly or commands may hang.

Veritas DMP will potentially enable previously disabled paths to the LUNs, as the SCSI inquiry to the LUN is successful. The Veritas DMP restore daemon will check the LUN stability and enable the path after some time.

During the restart of the vxconfigd daemon or vxdisk scandisks operations, vxconfigd reads the LABEL and the geometry against the underlying device by issuing a read on the device. Where the devices are NR devices the read on these devices is failing, which is leading to vxconfigd related hangs and the delay when executing VxVM commands as well.
 

Error Message


The error show in this instance related to Hardware cloned ShadowImage devices:


At 22:05:31 - The Hardware cloned device presented to the host is switched from SPLIT to PAIR state on storage side, restarting the snap re-sync operation.
 

- When running VxVM command "vxdisk list" on this specific DMP node, VxVM performs a read operation on the device resulting in vxconfigd hanging.
 

22:05:58 - DMP registered first error on one path

Sat Oct 12 22:05:58.854: I/O error occurred on Path c13t50060E80072B1E12d4s2(242/504) belonging to Dmpnode hp_xp7-0_3100(22/56)


- The restore restore daemon enabled the path back, as the SCSI inquiry is successful:

Sat Oct 12 22:05:58.855: I/O analysis done as DMP_PATH_OKAY on Path c13t50060E80072B1E12d4s2(242/504) belonging to Dmpnode hp_xp7-0_3100(22/56)
 

- Veritas DMP is reporting a multiple error messages associated with the paths to ShadowImage device:
 

Sat Oct 12 22:08:10.871: I/O retry(135) on Path c13t50060E80072B1E12d4s2(242/504) belonging to Dmpnode hp_xp7-0_3100(22/56)

 

Cause


EMC are currently able to expose the NR and Write-Disabled (WD) LUN characteristics for EMC BCV and SRDF LUNs. The Veritas code has been enhanced to handle these LUNs in a special way, thus avoiding any interoperability issues surrounding these special Hardware Clone and replicated type devices.

Special cloned devices such as Hitachi ShadowImage and EMC Symclones are unable to expose the LUN state via a SCSI inquiry.

In the past, we have requested these devices be offlined at the VxVM layer using the command:

# vxdisk offline <disk-access-name>

Prior to importing the hardware cloned images, the disk needs to be onlined again.

# vxdisk online disk-access-name>

Operational Impact:

The VxVM command such as "vxdisk list <da-name>" may take considerable time to respond back, in this instance 37 minutes:

- - ShadowImage (snap) device placed in re-sync state at 01:47:51

# date; pairdisplay -g bc_hp -IBC1 -CLI -fcx -l
Sun Feb 9 01:47:26 MSK 2020
Group PairVol L/R Port# TID LU-M Seq# LDEV# P/S Status % P-LDEV# M
bc_hp disk_00 L CL1-C-3 32 4 0 311038 3100 S-VOL SSUS 99 3000 -

# date; pairresync -l -g bc_hp -IBC1
Sun Feb 9 01:47:51 MSK 2020

# date; pairdisplay -g bc_hp -IBC1 -CLI -fcx -l
Sun Feb 9 01:47:55 MSK 2020
Group PairVol L/R Port# TID LU-M Seq# LDEV# P/S Status % P-LDEV# M
bc_hp disk_00 L CL1-C-3 32 4 0 311038 3100 S-VOL PAIR 100 3000 -


- The "vxdisk list <da-name>" command is now executed:


# time vxdisk list hp_xp7-0_3100
VxVM vxdisk ERROR V-5-1-539 Device hp_xp7-0_3100: get_contents failed:
Disk device is offline
Device: hp_xp7-0_3100
devicetag: hp_xp7-0_3100
type: auto
info: format=cdsdisk,privoffset=256,pubslice=2,privslice=2
flags: online ready private autoconfig udid_mismatch
pubpaths: block=/dev/vx/dmp/hp_xp7-0_3100s2 char=/dev/vx/rdmp/hp_xp7-0_3100s2
guid: {c0db4462-d5f6-11e9-bec5-90e2ba6ce200}
udid: HP%5F50%5F02B1E%5F50302B1E3100
site: -
Multipathing information:
numpaths: 2
c13t50060E80072B1E12d4s2 state=enabled
c7t50060E80072B1E02d4s2 state=enabled

real 37m49.586s
user 0m0.008s
sys 0m0.019s

Whilst vxconfigd may appear hung, the VxVM command itself was just running very slowly and eventually responded back after 37 minutes, returning the error "get_contents failed: Disk device is offline".

The I/O TIMEOUT error/delay is against one I/O, not one command. The "hung" issue is because the command took a long time to wait for all triggered I/Os' to return.


 

Solution
 

The DMP code has been enhanced in multiple ways for those vendors exposing the LUN characteristics. The DMP intelligence is provided via two methods, the ASLAPM (Array Support Libraries & Array Policy Modules). The related ASLAPM package takes care of identifying the LUN state if exposed, so the appropriate actions are taken by vxconfigd/DMP. 
 

Unlike DMP_DISK_FAILURE, DMP_PATH_FAILURE, in case of DMP_IOTIMEOUT, DMP won't disable the path directly.

If the SCSI inquiry succeeds, this means the LUN itself is fine, so the path restoration is attempted. Even when the path(s) are disabled (like path ok timeout), DMP will enable it when checking the path health during the next monitor cycle(by default 300 seconds), as long as SCSI inquiry can succeed.

The DMP handling can be different depending on the DMP recoveryoption adopted.


Sample output:

# vxdmpadm getattr enclosure hp_xp7-0
ENCLR_NAME ATTR_NAME                     DEFAULT         CURRENT
============================================================================
hp_xp7-0   iopolicy                      MinimumQ        Round-Robin
hp_xp7-0   partitionsize                 512             512
hp_xp7-0   use_all_paths                 -               -
hp_xp7-0   recoveryoption[throttle]      Nothrottle[0]   Nothrottle[0]
hp_xp7-0   recoveryoption[errorretry]    Timebound[300]  Timebound[380]


The above output shows the recovery policy defined as Timebound with an I/O timeout of 380 seconds. Any failed I/O can be judged as temporary error, DMP will keep retrying it within the 380 second window. The underlying layers must still report back upstream to DMP with the response, so DMP can react.

Other customers experienced long system reboot times, as DMP keeps retrying IO.

During further investigation, Veritas identified a more dynamic workaround for servers hosting special hardware unidentified (not detected by the ASL) cloned devices.

Workaround:

By changing the recovery policy from timebound to fixedretry (3), DMP will attempt to retry the I/O request only 3 times, reducing the time spent.

Example: DMP timebound vs Fixedrety

1. Timebound policy testing

One snap (ShadowImage) device in SPLIT state.

# pairdisplay -g bc_hp -IBC1 -CLI -fcx -l
Group PairVol L/R Port# TID LU-M Seq# LDEV# P/S Status % P-LDEV# M
bc_hp disk_00 L CL1-C-3 32 4 0 311038 3100 S-VOL SSUS 99 3000 -

The device is then switched to re-sync

# date; pairresync -l -g bc_hp -IBC1; pairdisplay -g bc_hp -IBC1 -CLI -fcx -l
Sun Jan 26 01:17:47 MSK 2020
Group PairVol L/R Port# TID LU-M Seq# LDEV# P/S Status % P-LDEV# M
bc_hp disk_00 L CL1-C-3 32 4 0 311038 3100 S-VOL COPY 99 3000 -

# pairdisplay -g bc_hp -IBC1 -CLI -fcx -l
Group PairVol L/R Port# TID LU-M Seq# LDEV# P/S Status % P-LDEV# M
bc_hp disk_00 L CL1-C-3 32 4 0 311038 3100 S-VOL PAIR 100 3000 -

# date
Sun Jan 26 01:17:54 MSK 2020
 

When running "vxdisk list" commands on the SNAP device, vxconfigd hangs

Until the snap device is switched back to SPLIT state, VxVM commands will either LUN very slowly or appear to be hung.

# date; pairsplit -l -g bc_hp -IBC1; pairdisplay -g bc_hp -IBC1 -CLI -fcx -l
Sun Jan 26 01:29:46 MSK 2020
Group PairVol L/R Port# TID LU-M Seq# LDEV# P/S Status % P-LDEV# M
bc_hp disk_00 L CL1-C-3 32 4 0 311038 3100 S-VOL SSUS 100 3000 -

 

2. Fixedretry testing
 

By changing the DMP policy to fixedretry with the count=3, no issues reported.

# date;pairdisplay -g bc_hp -IBC1 -CLI -fcx -l;pairresync -l -g bc_hp -IBC1
Sun Jan 26 01:36:33 MSK 2020
Group PairVol L/R Port# TID LU-M Seq# LDEV# P/S Status % P-LDEV# M
bc_hp disk_00 L CL1-C-3 32 4 0 311038 3100 S-VOL SSUS 99 3000 -


root@bull-m # date; pairdisplay -g bc_hp -IBC1 -CLI -fcx -l
Sun Jan 26 01:36:39 MSK 2020
Group PairVol L/R Port# TID LU-M Seq# LDEV# P/S Status % P-LDEV# M
bc_hp disk_00 L CL1-C-3 32 4 0 311038 3100 S-VOL PAIR 100 3000 -

 

The problem is not experienced when the DMP recoveryoption is set to fixedretry with a count of 3.


Summary
 

DMP will keep retrying the failed IO until the DMP timebound timeout is reached. The time may be extended depending on how much time is spent in the lower layer during each SCSI retry attempt. 

When the special device is in a sync state, the device will return an I/O failure back upstream. Where the SCSI inquiry still succeeds, DMP interprets the I/O error as a temporary error, and will continue to retry the failed I/O.

When the DMP recoveryoption is configured using the timebound policy, DMP will keep retrying the I/O for the specified time in seconds.(in this instance the customized 380 seconds). The lower layers may not complete the failure of the I/O until after the 380 seconds window, as the in-flight retried I/O's may succeed the 380 defined timeout.

When configured using the fixedretry policy, DMP will keep retrying the I/O for the specified number of times (DMP retries the failed I/O just once in this instance). 

If the device is returning the IO error quickly, there is no value in reducing the timebound value to 150 seconds, as the fixedretry count set to 3, would potentially fail the IO quicker than the defined 150 seconds.

The I/O TIMEOUT error is against one I/O, not one command. The "hung" issue is because the command took a long time to wait for all triggered I/Os' to return.

Script change:

Users could potentially change their backup scripts to add the fixedretry workaround when the devices are in sync state. As soon as the backup is done, the script can change the recovery_option back to timebound (180 seconds should be fine in most cases).

The main difference between the recovery options in the time handling the I/O error.


End Goal:

Besides offlining these devices and setting the DMP recovery policy to fixedretry, Veritas requires the third party vendors to enhance the visibility of the LUN characteristics via SCSI inquiry content. Once exposed Veritas can design specific handling conditions based on what is exposed by the lower layers.

 

Was this content helpful?