Inactive snapshot luns in a Clariion array configured in ALUA mode experience trespass with Volume Manager and Dynamic Multi-Pathing (DMP)
Problem
An environment comprising of Clariion Array with ALUA (failover mode 4) , inactive snapshot luns (snapluns), Volume Manager's (VxVM) DMP is susceptible to trespasses. The trespass of the inactive snapshot luns also cause the source or primary luns to trespass as well.
Error Message
Trespasses are detected by using EMC's Navisphere GUI or Command Line Interface "naviseccli" and/or Array logs ( SP logs)
When a LUN trespass occurs within the CLARiiON array it logs a message in the CX event log similar to the following:
REFERENCE:
Cause
Inactive snapshot luns like "NOT_READY" (NR) devices are a class of devices whose behavior is characterized by SCSI inquiries on a path succeeding but subsequent I/Os resulting in failure. When VxVM commands like "vxdisk scandisks" or "vxdctl enable" is executed - as part of device discovery and attempts to online the inactive snaplun , an I/O is issued, which fails causing DMP to mark the path as failed. DMP then initiates a path failover which results in the trespass.
Solution
Veritas Engineering in collaboration with EMC Engineering implemented a vendor (EMC) specific error code to detect snapshot luns. The modified code change alters the behavior preventing a trespass by not issuing a failover when SCSI inquiry succeeds indicating an active path and a read returns the specific error code (5/25/01) for inactive snapshot lun
FIX INCORPORATED IN:
-Updated version of Array Policy Module (APM) distributed via an updated VRTSaslapm package for 51SP1_RP2
Package Version : 5.1.100.301
Note: The ASL/APM package should be applied to 51SP1_RP2 for addressing the lun trespass issue as 51SP1_RP2 contains a related fix.
NOTE1: Follow the "Post Installation Checks" documented in the README to ensure the proper version of ASL/APM is loaded and Active
Post-Installation Checks:
----------------------------
A. After you install VRTSaslapm package, verify output of
"vxddladm listsupport" command:
# vxddladm listsupport [all]
# vxddladm listsupport libname=
B. Check the output of "vxdmpadm listapm" command.
The APM should be in Active state if the corresponding array
is connected:
# vxdmpadm listapm [all | apmname]
C. To see package information, execute
# pkginfo -l VRTSaslapm
NOTE2: The fix will not prevent all messages ( which may be construed as "spurious") from being logged.
FOR EX: The following sequence of messages will be emitted when commands like "vxdisk scandisks" or "vxdctl enable" are run . Even though there is no actual "failover" - messages indicating failover are still observed
Jan 17 12:29:04 hostname.com vxdmp: [ID 824220 kern.notice] NOTICE: VxVM vxdmp V-5-0-111 disabled dmpnode 300/0x10
Jan 17 12:29:04 hostname.com vxdmp: [ID 447055 kern.notice] NOTICE: VxVM vxdmp V-5-0-0 failover initiated for 300/0x10
Jan 17 12:29:04 hostname.com vxdmp: [ID 447055 kern.notice] NOTICE: VxVM vxdmp V-5-0-0 curpri set to NULL for 300/0x10
Jan 17 12:29:04 hostname.com vxdmp: [ID 447055 kern.notice] NOTICE: VxVM vxdmp V-5-0-0 failover initiated for 300/0x10
Jan 17 12:29:04 hostname.com vxdmp: [ID 447055 kern.notice] NOTICE: VxVM vxdmp V-5-0-0 curpri set to NULL for 300/0x10
Jan 17 12:29:04 hostname.com vxdmp: [ID 238993 kern.notice] NOTICE: VxVM vxdmp 0 dmp_tur_temp_pgr: open failed: error = 6 dev=0x12c/0x12
Jan 17 12:29:06 hostname.com vxdmp: [ID 447055 kern.notice] NOTICE: VxVM vxdmp V-5-0-0 failover initiated for 300/0x10
Jan 17 12:29:06 hostname.com vxdmp: [ID 447055 kern.notice] NOTICE: VxVM vxdmp V-5-0-0 curpri set to NULL for 300/0x10
Jan 17 12:29:06 hostname.com vxdmp: [ID 447055 kern.notice] NOTICE: VxVM vxdmp V-5-0-0 failover initiated for 300/0x10
Jan 17 12:29:06 hostname.com vxdmp: [ID 447055 kern.notice] NOTICE: VxVM vxdmp V-5-0-0 curpri set to NULL for 300/0x10
Once "vxdisk scandisks" or "vxdctl enable" is executed - this will trigger the daemon "vxattachd" at regular intervals, attempt to online the inactive snapluns. As a result the above messages will be logged at regular intervals. The workaround to prevent the periodic logging of the messages is to kill vxattachd. Please refer to the vxattachd man page for additional details.
NOTE3: An enhancement has been filed to suppress un-necessary messages
e2661911:RFE: Reduce the amount of messages generated when inactive luns are present
NOTE4: Please refer to 000015793 for documentation of similar issue in Linux environment.
Applies To
Solaris + Volume Manager 5.0.x and 5.1 and above + CLARiiON Array configured in ALUA mode+ inactive snapshot luns