Inactive snapshot luns in a Clariion array configured in ALUA mode experience trespass with Volume Manager and Dynamic Multi-Pathing (DMP)

Problem

An environment comprising of Clariion Array with ALUA (failover mode 4) , inactive snapshot luns (snapluns), Volume Manager's (VxVM) DMP is susceptible to trespasses. The trespass of the inactive snapshot luns also cause the source  or primary luns to trespass as well.

Error Message

Trespasses are detected by using EMC's Navisphere GUI or Command Line Interface "naviseccli"  and/or Array logs ( SP logs)

When a LUN trespass occurs within the CLARiiON array it logs a message in the CX event log similar to the following:

 Bus 0 Enclosure 0 Disk 9(606) Unit Shutdown for Trespass              [0x00]   290434   10080009

REFERENCE:

EMC Powerlink article (emc88102)

Cause

Inactive snapshot luns like "NOT_READY" (NR) devices are a class of devices whose behavior is characterized by  SCSI inquiries on a path succeeding  but subsequent I/Os resulting in  failure. When VxVM commands like "vxdisk scandisks" or "vxdctl enable" is executed - as part of device discovery and attempts to online the inactive snaplun , an I/O is issued, which fails causing DMP to mark the path as failed. DMP then initiates a path failover which results in the trespass.

Solution

Symantec Engineering in collaboration with EMC Engineering implemented a vendor (EMC) specific error code to detect snapshot luns. The modified code change alters the behavior preventing a trespass by not issuing a failover when SCSI inquiry succeeds indicating an active path and a read returns the specific error code (5/25/01) for inactive snapshot lun

FIX INCORPORATED IN:

-Updated version of Array Policy Module (APM) distributed via an updated VRTSaslapm package for 51SP1_RP2

Package Version : 5.1.100.301 

NOTE: The ASL/APM package should be applied to 51SP1_RP2 for addressing the lun trespass issue as 51SP1_RP2 contains a related fix.

NOTE1: Follow the "Post Installation Checks" documented in the README to ensure the proper version of ASL/APM is loaded and Active 

Post-Installation Checks:
   ----------------------------

A. After you install VRTSaslapm package, verify output of
   "vxddladm listsupport" command:

# vxddladm listsupport [all]
# vxddladm listsupport libname=

B. Check the output of "vxdmpadm listapm" command.
   The APM should be in Active state if the corresponding array
   is connected:

# vxdmpadm listapm [all | apmname]

C. To see package information, execute

# pkginfo -l VRTSaslapm
 

 NOTE2: The fix will not prevent all messages ( which may be construed as "spurious") from being logged.

FOR EX: The following sequence of messages will be emitted when commands like "vxdisk scandisks" or "vxdctl enable" are run . Even though there is no actual "failover"  - messages indicating failover are still observed

Jan 17 12:29:04 hostname.com vxdmp: [ID 824220 kern.notice] NOTICE: VxVM vxdmp V-5-0-111 disabled dmpnode 300/0x10
Jan 17 12:29:04 hostname.com vxdmp: [ID 447055 kern.notice] NOTICE: VxVM vxdmp V-5-0-0 failover initiated for 300/0x10
Jan 17 12:29:04 hostname.com vxdmp: [ID 447055 kern.notice] NOTICE: VxVM vxdmp V-5-0-0 curpri set to NULL for 300/0x10
Jan 17 12:29:04 hostname.com vxdmp: [ID 447055 kern.notice] NOTICE: VxVM vxdmp V-5-0-0 failover initiated for 300/0x10
Jan 17 12:29:04 hostname.com vxdmp: [ID 447055 kern.notice] NOTICE: VxVM vxdmp V-5-0-0 curpri set to NULL for 300/0x10
Jan 17 12:29:04 hostname.com vxdmp: [ID 238993 kern.notice] NOTICE: VxVM vxdmp 0 dmp_tur_temp_pgr: open failed: error = 6 dev=0x12c/0x12
Jan 17 12:29:06 hostname.com vxdmp: [ID 447055 kern.notice] NOTICE: VxVM vxdmp V-5-0-0 failover initiated for 300/0x10
Jan 17 12:29:06 hostname.com vxdmp: [ID 447055 kern.notice] NOTICE: VxVM vxdmp V-5-0-0 curpri set to NULL for 300/0x10
Jan 17 12:29:06 hostname.com vxdmp: [ID 447055 kern.notice] NOTICE: VxVM vxdmp V-5-0-0 failover initiated for 300/0x10
Jan 17 12:29:06 hostname.com vxdmp: [ID 447055 kern.notice] NOTICE: VxVM vxdmp V-5-0-0 curpri set to NULL for 300/0x10

Once "vxdisk scandisks" or "vxdctl enable" is executed - this will trigger the daemon "vxattachd" at regular intervals, attempt to online the inactive snapluns. As a result the above messages will be logged at regular intervals. The workaround to prevent the periodic logging of the messages  is to kill vxattachd. Please refer to the vxattachd man page for additional details.
 

NOTE3: An enhancement has been filed to suppress un-necessary messages

e2661911:RFE: Reduce the amount of messages generated when inactive luns are present

NOTE4: Please refer to TECH178359 for documentation of similar issue in Linux environment.


Applies To

Solaris + Volume Manager  5.0.x and 5.1 and above + CLARiiON Array configured in ALUA mode+ inactive snapshot luns

Terms of use for this information are found in Legal Notices.

Search

Survey

Did this article answer your question or resolve your issue?

No
Yes

Did this article save you the trouble of contacting technical support?

No
Yes

How can we make this article more helpful?

Email Address (Optional)