Search <product_name> all support & community content...

AIX: EMC SRDF-R2 devices in read-write mode can cause diskgroup import failures with tunable dmp_cache_open=ON

Article: 100005704

Last Published: 2022-12-06

Ratings: 2 0

Product(s): InfoScale & Storage Foundation

Problem

This article outlines a workaround for a known issue with importing EMC SRDF-R1 devices with EMC PowerPath and Veritas Dynamic-Multi-pathing (DMP) on AIX.

EMC SRDF-R2 devices upon transitioning to read/write (RW) mode from write-disabled (WD), contribute to diskgroup import failures with dmp_cache_open=ON.

The issue is specific to AIX platforms only with EMC Storage. The issue does not occur with Linux and Solaris platforms.

Error Message

After installing VxVM 51_SP1_RP1 onwards, the symptoms and errors are as follows:

From errpt_a :

Detail Data
DESCRIPTION
WARNING VxVM vxio V-5-3-0 voldio: Disk hdisk10 is write-protected, disallow write <<<<< NOTE
---------------------------------------------------------------------------

more vxdisk_list
DEVICE       TYPE            DISK         GROUP        STATUS
hdisk0       auto:LVM        -            -            LVM
hdisk2       auto:aixdisk    oraappdg0101 oraappdg01   online
hdisk3       auto:aixdisk    oraappdg0102 oraappdg01   online
hdisk6       auto:aixdisk    oraappdg0103 oraappdg01   online
hdisk10      auto:aixdisk    -            -            online                           <<<<<<<<<<<NOTE

OUTPUT from vxdisk_list_devicename:

devicetag: hdisk10
type:      auto
hostid:
disk:      name= id=1294938151.91.oraqpdb01
group:     name=HA_dataraw01 id=1294938164.95.oraqpdb01
info:      format=aixdisk,privoffset=256
flags:     online ready private autoconfig
pubpaths: block=/dev/vx/dmp/hdisk10 char=/dev/vx/rdmp/hdisk10
guid:      -
udid:      EMC%5FSYMMETRIX%5F000192601936%5F3600535000
site:      -
version:   2.1
iosize:    min=512 (bytes) max=512 (blocks)
public:    slice=0 offset=66048 len=105866562 disk_offset=0
private:   slice=0 offset=256 len=65536 disk_offset=0
update:    time=1300727495 seqno=0.253
ssb:       actual_seqno=0.0
headers:   0 248
configs:   count=1 len=48346
logs:      count=1 len=7325
Defined regions:
config   priv 000017-000247[000231]: copy=01 offset=000000 enabled
config   priv 000249-048363[048115]: copy=01 offset=000231 enabled
log      priv 048364-055688[007325]: copy=01 offset=000000 enabled
Annotations:
tag      udid_asl=EMC%5FSYMMETRIX%5F000192601685%5F85013FE008
Multipathing information:
numpaths:   4

Snippet from engine_log: ( diskgroup still cannot be imported successfully)

2011/03/21 11:20:49 VCS WARNING V-16-10011-715 (oraqpdb21) DiskGroup:db-ora-HA_tools01-dg:online:Diskgroups will be imported without reservations
2011/03/21 11:20:53 VCS WARNING V-16-10011-702 (oraqpdb21) DiskGroup:db-ora-HA_tools01-dg:online:vxdg import (clear flag) failed. Trying force import
2011/03/21 11:20:53 VCS ERROR V-16-10011-703 (oraqpdb21) DiskGroup:db-ora-HA_tools01-dg:online:** ERROR: vxdg import (force) failed on Disk Group HA_tools01
2011/03/21 11:20:57 VCS ERROR V-16-10011-705 (oraqpdb21) DiskGroup:db-ora-HA_tools01-dg:online:** ERROR: vxdg import failed on Disk Group HA_tools01 after vxdctl enable

2012/06/26 17:39:58 VCS INFO V-16-2-13716 (rdgpow5aix02) Resource(DG_TEST_RES): Output of the completed operation (online)
==============================================
VxVM vxdg ERROR V-5-1-10978 Disk group maxdg: import failed:
No valid disk found containing disk group
VxVM vxdg ERROR V-5-1-10978 Disk group maxdg: import failed:
No valid disk found containing disk group

Or, if they are using fencing, they will see this error when VCS attempts to write the keys to the LUN;

2012/04/10 23:18:50 VCS INFO V-16-2-13716 ( rdgpow5aix02 ) Resource( DG_TEST_RES ): Output of the completed operation (actions)
==============================================
VxVM vxdg ERROR V-5-1-10978 Disk group maxdg: import failed:
SCSI-3 PR operation failed
VxVM vxdg ERROR V-5-1-10978 Disk group maxdg: import failed:
SCSI-3 PR operation failed
VxVM vxdg ERROR V-5-1-10978 Disk group maxdg: import failed:
SCSI-3 PR operation failed

You can differentiate this from other SCSI-3 PR key failures by putting vxconfigd into debug mode. In the debug logging, you will see an error similar to the following;

prdev_open(/dev/vx/rdmp/emc0001_123d): open failure: 47

The "open failure: 47" message indicates write-disabled media.

Cause

The issue is specific to AIX. Root cause is unknown at this time.

[About Volume Manager tunable 'dmp_cache_open']

If this parameter is set to on , the first open of a device that is performed by the Veritas Array Support Library(ASL) is cached.

This caching enhances the performance of device discovery by minimizing the overhead that is caused by subsequent open calls by ASLs.

If this parameter is set to OFF, caching is not performed.

The default value is on.

With dmp_cache_open set to ON by default, when an application (or vxconfigd) issues an open call on a sub-path - the open request is only issued once and cached. Subsequent opens use the cached entry and just increment the reference count.

Originally the EMC SRDF-R2 devices are in write-disabled (WD) mode.

With dmp_cache_open=ON , the read-only (or write-disabled) mode is cached.

Thus, device opens in read-write mode are dis-allowed. During a diskgroup import - configuration copy updates require the device open in write mode to be successful. As the device open in write mode is unsuccessful - it results in diskgroup import failure.

After installing 51_SP1_RP1 - the devices are in "ONLINE" state.

However, due to the behaviour explained above - diskgroup import failures occur with 51_SP1_RP1 onwards for AIX releases only.

Solution

In a Veritas Cluster Server (VCS) environment, a workaround is now available via the latest SRDF agent (Q2 2012 aka 5.0.14.0).

This version of the agent will automatically run 'vxdisk rm <daname>' on SRDF LUNs when their device state changes. This clears the dmp open cache specifically on the SRDF LUNs. Non-SRDF LUNs are left alone, and still benefit from dmp open caching.

There is a known issue in the current version of the agent where the 'vxdisk rm <daname>' command only happens when the attribute SwapRoles is enabled.

If SwapRoles is turned off, then the workaround never occurs. This is scheduled to be fixed in an upcoming version of the agent.

Workarounds:

In SRDF environments, the DMP tunable " dmp_cache_open" can be turned off/on briefly after the SRDF R2 to R1 transition and prior to diskgroup import. And turned back on after the import is completed.

1) Before failover, disable dmp_cache_open and re-enable after failback

Prior to importing the diskgroup consisting of SRDF devices the following script/sequence of commands can be executed:

# for d in `vxdisk -e list | grep srdf-r2 | awk '{ print $1 }'` ; do vxdisk rm $d ; done ; vxdisk scandisks

Note: ‘vxdisk rm <DA>’ basically close the paths completely and doesn’t keep any cached open even when dmp_cache_open is enabled

To turn OFF dmp_cache_open:

# vxdmpadm gettune all | grep cache
dmp_cache_open on on

# vxdmpadm settune dmp_cache_open=off
Tunable value will be changed immediately

Check if the change is in effect :

# vxdmpadm gettune all | grep cache
dmp_cache_open off on

An entry in the /etc/vx/dmppolicy.info will make the tunable persistent

# cat /etc/vx/dmppolicy.info
arraytype
#
arrayname
#
enclosure
#
Tunables
dmp_cache_open=off

Another workaround if the vxscan disk takes a long time is:

======================================

#!/usr/bin/sh
set -x

disk_list=`/usr/sbin/vxdisk -e list | grep -i srdf-r2 | awk '{ print $1 }'`
for disk in $disk_list
do
/usr/sbin/vxdisk offline $disk
/usr/sbin/vxdisk online $disk
done

=====================================

NOTE: A solution for DMP users on AIX has been documented in article: 100054524

VRTScavf 7.4.2.2201 agent enhanced on AIX to handle EMC SRDF VxVM vxdg ERROR V-5-1-19179 Disk group AIXSRDF: import failed: SCSI-3 PR operation failed failures

Applies To

AIX configurations using EMC SRDF devices with Volume Manager are susceptible to this problem, regardless of the multi-pathing solution used.

The issue is related to EMC storage only on AIX Platforms only

InfoScale 7.4.x onwards is also impacted

Relates to Legacy Etrack incident: e2321367

References

Etrack : 2334711 JIRA : null

AIX: EMC SRDF-R2 devices in read-write mode can cause diskgroup import failures with tunable dmp_cache_open=ON

Problem

Error Message

Cause

Solution

Related Knowledge Base Articles

References

Was this content helpful?

Translated Content

AIX: EMC SRDF-R2 devices in read-write mode can cause diskgroup import failures with tunable dmp_cache_open=ON

Problem

Error Message

Cause

Solution

Related Knowledge Base Articles

References

Was this content helpful?

Article Languages

Translated Content

Translated Content