Why does it take Veritas Dynamic Multi-pathing (DMP) time to react to disabled and enabled controller related events

Article: 100051997
Last Published: 2022-01-11
Ratings: 7 0
Product(s): InfoScale & Storage Foundation

Problem


The following article describes why it can take Veritas Dynamic Multi-pathing (DMP) time to react to disabled and enabled controller related events.

 

Error Message


In the below example, DMP is taking over 3 minutes to react to the QLOGIC "OFFLINE" event. 
 

DISABLED EVENT EXAMPLE:

/var/adm/messages:

Nov 10 17:26:41 fred qlc: [ID 628150 kern.notice] NOTICE: Qlogic qlc(3,0,1): Link OFFLINE
Nov 10 17:26:51 fred fctl: [ID 517869 kern.warning] WARNING: fp(22)::OFFLINE timeout

/etc/vx/dmpevents.log:

Wed Nov 10 17:26:41.000: Port event received from the adapter port 97df720d80340021 fabricportid = 0x   0 event type is port offline 
Wed Nov 10 17:26:42.000: Port event received from the adapter port 97df720d80340021 fabricportid = 0x   0 event type is port offline 
Wed Nov 10 17:26:42.000: Port event received from the adapter port 97df720d80340021 fabricportid = 0x   0 event type is port offline 
Wed Nov 10 17:27:00.625: SCSI error occurred on Path c11t290018CF24DEECFFd40s2(246/490): opcode=0xa3 reported transport failure (status=0x0, key=0x0, asc=0x0, ascq=0x0) 
Wed Nov 10 17:27:00.625: Disabled Path c11t290018CF24DEECFFd40s2(246/488) belonging to Dmpnode HW_DIRAYA_INDICES07_D01(21/208) due to path failure
Wed Nov 10 17:27:00.625: SCSI error occurred on Path c11t282018CF24DEECFFd40s2(246/1194): opcode=0xa3 reported transport failure (status=0x0, key=0x0, asc=0x0, ascq=0x0) 
Wed Nov 10 17:27:00.625: Disabled Path c11t282018CF24DEECFFd40s2(246/1192) belonging to Dmpnode HW_DIRAYA_INDICES07_D01(21/208) due to path failure

 

Cause


By design, DMP only disables a controller if all subpaths belong to that controller are disabled.

# vxdmpadm listenclosure all
<snippet>
huawei-xsg10      HUAWEI-XSG1    210018cf24deecff     CONNECTED    ALUA       47         6000
<snippet>

 

The HUAWEI-XSG1 array has 47 dmpnodes and each dmpnode has 4 subpaths, 2 paths belonging to controller c11. 
 

Device:    HW_D01
numpaths:   4
c11t282018CF24DEECFFd40s2                  state=enabled          type=active/optimized(p)
c11t290018CF24DEECFFd40s2                  state=enabled          type=active/optimized(p)
c8t29A118CF24DEECFFd40s2                   state=enabled          type=active/optimized(p)
c8t28C118CF24DEECFFd40s2                   state=enabled          type=active/optimized(p)


Only the subpaths belonging to controller c11 ((47 * 2 = 94 subpaths) will be disabled. 


Once all the controller c11 DMP paths are disabled, only then will DMP disable the controller "c11".

/etc/vx/dmpevents.log:

Wed Nov 10 17:27:00.625: Disabled Path c11t290018CF24DEECFFd40s2(246/488) belonging to Dmpnode HW_D01(21/208) due to path failure
Wed Nov 10 17:27:00.625: Disabled Path c11t282018CF24DEECFFd40s2(246/1192) belonging to Dmpnode HW_D01(21/208) due to path failure
... ...
Wed Nov 10 17:30:03.603: Disabled Path c11t290018CF24DEECFFd31s2(246/560) belonging to Dmpnode HW_D01(21/384) due to path failure
Wed Nov 10 17:30:03.603: Disabled Path c11t282018CF24DEECFFd31s2(246/1336) belonging to Dmpnode HW_D01(21/384) due to path failure


Timeline Summary:

- The last subpath was disabled at "Wed Nov 10 17:30:03.603". 

- Only once the last subpath has been disabled, will DMP actually disable the c11 controller.
 

Wed Nov 10 17:30:03.603: Disabled Controller c11 belonging to Disk array huawei-xsg10

 

 

ENABLED EVENT EXAMPLE:
 

In the below example, DMP is taking over 2 minutes to react to the QLOGIC "ONLINE" event. 


/var/adm/messages:
Nov 10 17:31:16 fred qlc: [ID 628150 kern.notice] NOTICE: Qlogic qlc(3,0,1): Link ONLINE
Nov 10 17:31:16 fred fp: [ID 517869 kern.warning] WARNING: fp(22): N_x Port with D_ID=ac900, PWWN=290018cf24deecff reappeared in fabric
Nov 10 17:31:16 fred fp: [ID 517869 kern.warning] WARNING: fp(22): N_x Port with D_ID=ac300, PWWN=282018cf24deecff reappeared in fabric
Nov 10 17:31:16 fred fp: [ID 517869 kern.warning] WARNING: fp(22): N_x Port with D_ID=ac600, PWWN=5000097408402da9 reappeared in fabric
Nov 10 17:31:16 fred fp: [ID 517869 kern.warning] WARNING: fp(22): N_x Port with D_ID=ae400, PWWN=5000097408402da1 reappeared in fabric
Nov 10 17:31:16 fred scsi: [ID 583741 kern.notice]  Target 0xae400 LUN 0x0: Nonzero peripheral qualifier: Device type=0x1f Peripheral qual=0x1
Nov 10 17:31:16 fred scsi: [ID 583741 kern.notice]  Target 0xac600 LUN 0x0: Nonzero peripheral qualifier: Device type=0x1f Peripheral qual=0x1

.
.
/etc/vx/dmpevents.log:
Nov 10 17:33:26 ma8a1c000 vxdmp: [ID 144536 kern.notice] NOTICE: VxVM vxdmp V-5-0-0 [Info] enabled controller /pci@319/pci@1/SUNW,qlc@0,1/fp@0,0 connected to disk array 210018cf24deecff

 

Solution


Dynamic Multi-Pathing (DMP) maintains a kernel task that re-examines the condition of paths at a specified interval. The type of analysis that is performed on the paths depends on the checking policy that is configured.


The default dmp_restore_policy is check_disabled and will check every 300 seconds.


Refer to article "100051995" for more details surrounding the below DMP tunables:


1.] The path restoration thread analyzes all paths in the system and revives the paths that are back online, as well as disabling the paths that are inaccessible. The command to configure this policy is:
 

# vxdmpadm settune dmp_restore_policy = check_all


2.] The dmp_restore_interval tunable parameter specifies how often the path restoration thread examines the path
 

# vxdmpadm settune dmp_restore_interval = 150

 

References

JIRA : STESC-6531

Was this content helpful?