AIX: EMC SRDF-R2 devices in read-write mode can cause diskgroup import failures with tunable dmp_cache_open=ON
Problem
This article outlines a workaround for a known issue with importing EMC SRDF-R1 devices with EMC PowerPath and Veritas Dynamic-Multi-pathing (DMP) on AIX.
EMC SRDF-R2 devices upon transitioning to read/write (RW) mode from write-disabled (WD), contribute to diskgroup import failures with dmp_cache_open=ON.
The issue is specific to AIX platforms only with EMC Storage. The issue does not occur with Linux and Solaris platforms.
Error Message
After installing VxVM 51_SP1_RP1 onwards, the symptoms and errors are as follows:
From errpt_a :
Detail Data
DESCRIPTION
WARNING VxVM vxio V-5-3-0 voldio: Disk hdisk10 is write-protected, disallow write <<<<< NOTE
---------------------------------------------------------------------------
more vxdisk_list
DEVICE TYPE DISK GROUP STATUS
hdisk0 auto:LVM - - LVM
hdisk2 auto:aixdisk oraappdg0101 oraappdg01 online
hdisk3 auto:aixdisk oraappdg0102 oraappdg01 online
hdisk6 auto:aixdisk oraappdg0103 oraappdg01 online
hdisk10 auto:aixdisk - - online <<<<<<<<<<<NOTE
OUTPUT from vxdisk_list_devicename:
devicetag: hdisk10
type: auto
hostid:
disk: name= id=1294938151.91.oraqpdb01
group: name=HA_dataraw01 id=1294938164.95.oraqpdb01
info: format=aixdisk,privoffset=256
flags: online ready private autoconfig
pubpaths: block=/dev/vx/dmp/hdisk10 char=/dev/vx/rdmp/hdisk10
guid: -
udid: EMC%5FSYMMETRIX%5F000192601936%5F3600535000
site: -
version: 2.1
iosize: min=512 (bytes) max=512 (blocks)
public: slice=0 offset=66048 len=105866562 disk_offset=0
private: slice=0 offset=256 len=65536 disk_offset=0
update: time=1300727495 seqno=0.253
ssb: actual_seqno=0.0
headers: 0 248
configs: count=1 len=48346
logs: count=1 len=7325
Defined regions:
config priv 000017-000247[000231]: copy=01 offset=000000 enabled
config priv 000249-048363[048115]: copy=01 offset=000231 enabled
log priv 048364-055688[007325]: copy=01 offset=000000 enabled
Annotations:
tag udid_asl=EMC%5FSYMMETRIX%5F000192601685%5F85013FE008
Multipathing information:
numpaths: 4
Snippet from engine_log: ( diskgroup still cannot be imported successfully)
2011/03/21 11:20:49 VCS WARNING V-16-10011-715 (oraqpdb21) DiskGroup:db-ora-HA_tools01-dg:online:Diskgroups will be imported without reservations
2011/03/21 11:20:53 VCS WARNING V-16-10011-702 (oraqpdb21) DiskGroup:db-ora-HA_tools01-dg:online:vxdg import (clear flag) failed. Trying force import
2011/03/21 11:20:53 VCS ERROR V-16-10011-703 (oraqpdb21) DiskGroup:db-ora-HA_tools01-dg:online:** ERROR: vxdg import (force) failed on Disk Group HA_tools01
2011/03/21 11:20:57 VCS ERROR V-16-10011-705 (oraqpdb21) DiskGroup:db-ora-HA_tools01-dg:online:** ERROR: vxdg import failed on Disk Group HA_tools01 after vxdctl enable
2012/06/26 17:39:58 VCS INFO V-16-2-13716 (rdgpow5aix02) Resource(DG_TEST_RES): Output of the completed operation (online)
==============================================
VxVM vxdg ERROR V-5-1-10978 Disk group maxdg: import failed:
No valid disk found containing disk group
VxVM vxdg ERROR V-5-1-10978 Disk group maxdg: import failed:
No valid disk found containing disk group
Or, if they are using fencing, they will see this error when VCS attempts to write the keys to the LUN;
2012/04/10 23:18:50 VCS INFO V-16-2-13716 ( rdgpow5aix02 ) Resource( DG_TEST_RES ): Output of the completed operation (actions)
==============================================
VxVM vxdg ERROR V-5-1-10978 Disk group maxdg: import failed:
SCSI-3 PR operation failed
VxVM vxdg ERROR V-5-1-10978 Disk group maxdg: import failed:
SCSI-3 PR operation failed
VxVM vxdg ERROR V-5-1-10978 Disk group maxdg: import failed:
SCSI-3 PR operation failed
You can differentiate this from other SCSI-3 PR key failures by putting vxconfigd into debug mode. In the debug logging, you will see an error similar to the following;
prdev_open(/dev/vx/rdmp/emc0001_123d): open failure: 47
The "open failure: 47" message indicates write-disabled media.
Cause
The issue is specific to AIX. Root cause is unknown at this time.
[About Volume Manager tunable 'dmp_cache_open']
If this parameter is set to on , the first open of a device that is performed by the Veritas Array Support Library(ASL) is cached.
This caching enhances the performance of device discovery by minimizing the overhead that is caused by subsequent open calls by ASLs.
If this parameter is set to OFF, caching is not performed.
The default value is on.
With dmp_cache_open set to ON by default, when an application (or vxconfigd) issues an open call on a sub-path - the open request is only issued once and cached. Subsequent opens use the cached entry and just increment the reference count.
Originally the EMC SRDF-R2 devices are in write-disabled (WD) mode.
With dmp_cache_open=ON , the read-only (or write-disabled) mode is cached.
Thus, device opens in read-write mode are dis-allowed. During a diskgroup import - configuration copy updates require the device open in write mode to be successful. As the device open in write mode is unsuccessful - it results in diskgroup import failure.
After installing 51_SP1_RP1 - the devices are in "ONLINE" state.
However, due to the behaviour explained above - diskgroup import failures occur with 51_SP1_RP1 onwards for AIX releases only.
Solution
In a Veritas Cluster Server (VCS) environment, a workaround is now available via the latest SRDF agent (Q2 2012 aka 5.0.14.0).
This version of the agent will automatically run 'vxdisk rm <daname>' on SRDF LUNs when their device state changes. This clears the dmp open cache specifically on the SRDF LUNs. Non-SRDF LUNs are left alone, and still benefit from dmp open caching.
There is a known issue in the current version of the agent where the 'vxdisk rm <daname>' command only happens when the attribute SwapRoles is enabled.
If SwapRoles is turned off, then the workaround never occurs. This is scheduled to be fixed in an upcoming version of the agent.
Workarounds:
To turn OFF dmp_cache_open:
# vxdmpadm gettune all | grep cache
dmp_cache_open on on
# vxdmpadm settune dmp_cache_open=off
Tunable value will be changed immediately
Check if the change is in effect :
# vxdmpadm gettune all | grep cache
dmp_cache_open off on
An entry in the /etc/vx/dmppolicy.info will make the tunable persistent
# cat /etc/vx/dmppolicy.info
arraytype
#
arrayname
#
enclosure
#
Tunables
dmp_cache_open=off
Another workaround if the vxscan disk takes a long time is:
======================================
#!/usr/bin/sh
set -x
disk_list=`/usr/sbin/vxdisk -e list | grep -i srdf-r2 | awk '{ print $1 }'`
for disk in $disk_list
do
/usr/sbin/vxdisk offline $disk
/usr/sbin/vxdisk online $disk
done
NOTE: A solution for DMP users on AIX has been documented in article: 100054524
VRTScavf 7.4.2.2201 agent enhanced on AIX to handle EMC SRDF VxVM vxdg ERROR V-5-1-19179 Disk group AIXSRDF: import failed: SCSI-3 PR operation failed failures
Applies To
AIX configurations using EMC SRDF devices with Volume Manager are susceptible to this problem, regardless of the multi-pathing solution used.
The issue is related to EMC storage only on AIX Platforms only
InfoScale 7.4.x onwards is also impacted
Relates to Legacy Etrack incident: e2321367