Various hang scenarios can be caused by disabled or invalid Persistent FastResync snapshots with version 20 DCO

Various hang scenarios can be caused by disabled or invalid Persistent FastResync snapshots with version 20 DCO

Article: 100000565
Last Published: 2010-01-29
Ratings: 0 0
Product(s): InfoScale & Storage Foundation

Problem

Various hang scenarios can be caused by disabled or invalid Persistent FastResync snapshots with version 20 DCO

Error Message

unix: WARNING: VxVM vxio V-5-3-0 commit: Timedout abort the transaction!
unix: WARNING: VxVM vxio V-5-3-0 commit: Timedout waiting for Volume PFIcerpt-rvl_0d to quiesce, count 1

Solution

Due to Etrack incident 1469365 as listed in theSupplemental Material section, a Persistent FastResync snapshot (with version 20DCO) in DISABLED or INVALID state can cause the source volume to hang.  (This issue doesn't affect Persistent FastResync snapshot with version 0DCO.  DCO stands for Data Change Object.)  The incident causes VeritasVolume Manager to incorrectly handle the failed write operation to the disabledsnapshot volume and leads to the subsequent I/O hang.

The following hangscenarios have been reported when a source volume has a DISABLED or INVALIDsnapshot.

Scenario 1. The file system on the source volumehangs.
Scenario 2. The mount process hangs when the file system on the sourcevolume is being mounted.
Scenario 3. In a Veritas Volume Replication (VVR)configuration the affected primary data volume may cause vxrecover tohang.
Scenario 4. In a VVR configuration if the affected source volume is asecondary data volume, the VVR replication or synchronization canhang.  
Scenario 5. In a VVR configuration if the affected source volumeis a secondary data volume, the secondary Rlink may not be able todisconnect.  It is because the VVR error handling Staged I/O may hangbecause previous I/Os are hung.  The will lead to the secondary rlinkremains in "connected" state while the primary rlink goes into "disconnected"state.

Once the I/Os started to hang, subsequent vx commands (e.g.vxsnap, vxrlink) may timeout and the following system error messages arelogged.

unix: Warning: VxVM vxio V-5-3-0 commit: Timedout abort thetransaction!
unix: Warning: VxVM vxio V-5-3-0 commit: Timedout waiting forVolume PFIcerpt-rvl_0d to quiesce, count 1

The incident is fixed in thefollowing patches and the subsequent patches with higher versions.  

Storage Foundation 5.0MP3RP2 on Solaris
Storage Foundation 5.0MP3RP2on AIX
Storage Foundation 5.0MP3RP2 on Linux
Storage Foundation 5.0.1 onHP-UX 11.31

Veritas strongly recommends customers who are usingPersistent FastResync snapshot with version 20 DCO to apply the latest patch assoon as possible.

The latest patches can be obtained through VeritasOperations Services(VOS).

  https://sort.veritas.com

The incident is not fixed on HP-UX 11.23 platform yet as of March2010.  The fix will be available in future release of the Veritas VolumeManager 5.0MP2 Rolling Patch on HP-UX 11.23platform.

Workaround
=========
A workaround for the hang problem isto dissociate the snapshot volume.  Once the offensive snapshot volume isdissociated from the source volume, the incident can be avoided.  Pleasenote that if the system hangs during boot because vxrecover (which is started bythe system startup rc script) hangs, you'll need to boot the system to singleuser mode to fix the issue.

(Notes - The FastResync feature waspreviously called Fast Mirror Resynchronization or FMR.)


Was this content helpful?