Description
In the event that a plex reports a state of DETACHED IOFAIL, subsequent vxplex attach operations (resync) may fail due to a hidden "FAILIO" disk related kernel flag.
The hidden flag is only visible from specific Veritas Volume Manager (VxVM) versions.
With specific enhanced VxVM product releases customers are now able to see and clear the hidden DETACHED IOFAIL “ FAILIO ” flag.
Until this hidden flag is cleared, subsequent plex related operations may encounter read/write errors.
Scenario #1:
If the DMP recoveryoption for an enclosure has been defined with a timebound value of 300 seconds, and the SCSI layer takes longer than the expected 300 second (DMP timebound threshold value) window to fail the I/O, then DMP will not retry the I/O.
As a result of the DMP I/O threshold timeout being exhausted, the corresponding plex is detached and marked with the "DETACHED IOFAIL" plex state.
Impact:
The product will not be able to properly service I/O’s through plexes which have the “ FAILIO ” flag set locally on the impacted server. Further, incoming I/O ’s will experience read/write errors.
Traditionally there were 3 methods to clear the DETACHED IOFAIL “ FAILIO ” flag state:
1.] Deport the impacted disk group
2.] Recycle vxconfigd with the options –k and –r reset ( vxconfigd –k –r reset)
3.] Reboot the impacted server
Normal business operations will resume once the "FAILIO" flag has been cleared. The disk “ FAILIO ” flag is set for all cases whenever a DMP I/O timeout is experienced on a disk.
Scenario #2:
The volume consists of two plexes, plex A is attached and working fine, plex B is currently detached due to DMP I/O timeout event.
In the event that the surviving plex is also impacted as a result of a DMP I/O threshold timeout event, the last remaining (surviving) attached plex is not detached.
The hidden "FAILIO" flag will be set against both plexes in this instance.
NOTE : The volume will only be detached when a klog write error on the volume is encountered, otherwise, I/O error messages will continue to be reported in the syslog file solely against the impacted server.
The last surviving plex is never detached from a volume, the hidden flag is set and the plex remains attached as “Enabled Active”.
Veritas Volume Manager (VxVM) enhancement overview:
Sample Messages:
The server which encounters the DMP threshold I/O timeout event will record the setting of the "FAILIO" flag locally in the syslog file with the string " VOLD_FLAG_FAIL_IO flag set".
Jun 5 14:29:22 server101 kernel: VxVM vxdmp V-5-3-0 Reached DMP Threshold IO TimeOut (1 secs ) I/O with start 4fb1d6fb25cae and end 4fb1d703abf6e time
Jun 5 14:29:22 server101 kernel:
Jun 5 14:29:22 server101 kernel: VxVM vxdmp V-5-0-0 [Error] i/o error occurred ( errno =0x206) on dmpnode 201/0x30
Jun 5 14:29:22 server101 kernel:
Jun 5 14:29:22 server101 kernel: VxVM vxio V-5-3-0 voldiskiodone : VOLD_FLAG_FAIL_IO flag set on disk ibm_shark0_2
Jun 5 14:29:22 server101 kernel: VxVM vxio V-5-0-1266 Subdisk ibm_shark0_2-01 block 131048: Uncorrectable write error
Jun 5 14:29:22 server101 kernel: VxVM vxdmp V-5-3-0 I/O failed on path 8/0x50 after 1 retries for disk 201/0x30
Sample vxkprint :
The hidden "FAILIO" flag can be displayed using the revised "vxkprint" utlity.
# /etc/vx/diag.d/vxkprint > kprint_out <snippet>
Disk ibm_shark0_2 : dm=ibm_shark0_2 dgiid =1024.14 darid =1024.4 dmrid =0.1026
kflag =(failing)
sflag =()
vflag =( autoconfig|online )
failio flag=1
device_bdev =201/48 pubdev =201/51 privdev =201/51 publen =2097024 privlen =65792 maxiosize =1024
puboffset =0
type=auto info=format= cdsdisk,privoffset =256,pubslice=3,privslice=3
site=
tpau_size =0 tpmax_size =0 tpshift_off =0
log-copy 0: (size=4096)
guid = {5a2714cc-8809-11e2-8173-46833c108978}
iocount = 0
<snippet>
With the enhanced VxVM functionality, the user is now able to reset the hidden DETACHED IOFAIL “ FAILIO ” flag.
# vxdisk set ibm_shark0_2 failio =off
Revised vxkprint output
# /etc/vx /diag.d/vxkprint > kprint_out_after <snippet>
Disk ibm_shark0_2: dm=ibm_shark0_2 dgiid =1024.14 darid =1024.4 dmrid =0.1026
kflag =(failing)
sflag =()
vflag =( autoconfig|online )
failio flag=0
device_bdev =201/48 pubdev =201/51 privdev =201/51 publen =2097024 privlen =65792 maxiosize =1024
puboffset =0
type=auto info=format= cdsdisk,privoffset =256,pubslice=3,privslice=3
site=
tpau_size =0 tpmax_size =0 tpshift_off =0
log-copy 0: (size=4096)
guid = {5a2714cc-8809-11e2-8173-46833c108978}
iocount = 0
<snippet>
The enhanced VxVM functionality will be available with the release of the 6.1 MR1 patch. A series of private hot-fixes are available for Linux platforms at this time.
6.0.300.204 for Linux
6.0.500.002 for Linux
Note: The vxkprint format will change in future versions, i.e. InfoScale 7.2.x, failio flag=1 replaced with dflag=(failio).
New format:
# /etc/vx/diag.d/vxkprint <snippet>
Disk disk_2: dm=B00A21B71027DF65100F42DB93 dgiid=1024.9 darid=1024.4 dmrid=0.1027
kflag=(efi)
sflag=(sdopen)
vflag=(autoconfig|online)
dflag=()
.
.Disk disk_3: dm=B00A21B71027DF65130F652F3C dgiid=1024.9 darid=1024.6 dmrid=0.1026
kflag=(failing|efi|unknown=0x40000)
sflag=(sdopen)
vflag=(autoconfig|online)
dflag=(failio)<snippet>
Please contact Veritas support if you require hotfixes or patches for other platform releases.