Why does it take Veritas Dynamic Multi-pathing (DMP) time to react to disabled and enabled controller related events
Problem
The following article describes why it can take Veritas Dynamic Multi-pathing (DMP) time to react to disabled and enabled controller related events.
Error Message
In the below example, DMP is taking over 3 minutes to react to the QLOGIC "OFFLINE" event.
DISABLED EVENT EXAMPLE:
/var/adm/messages:Nov 10 17:26:41 fred qlc: [ID 628150 kern.notice] NOTICE: Qlogic qlc(3,0,1): Link OFFLINE
Nov 10 17:26:51 fred fctl: [ID 517869 kern.warning] WARNING: fp(22)::OFFLINE timeout
/etc/vx/dmpevents.log:Wed Nov 10 17:26:41.000: Port event received from the adapter port 97df720d80340021 fabricportid = 0x 0 event type is port offline
Wed Nov 10 17:26:42.000: Port event received from the adapter port 97df720d80340021 fabricportid = 0x 0 event type is port offline
Wed Nov 10 17:26:42.000: Port event received from the adapter port 97df720d80340021 fabricportid = 0x 0 event type is port offline
Wed Nov 10 17:27:00.625: SCSI error occurred on Path c11t290018CF24DEECFFd40s2(246/490): opcode=0xa3 reported transport failure (status=0x0, key=0x0, asc=0x0, ascq=0x0)
Wed Nov 10 17:27:00.625: Disabled Path c11t290018CF24DEECFFd40s2(246/488) belonging to Dmpnode HW_DIRAYA_INDICES07_D01(21/208) due to path failure
Wed Nov 10 17:27:00.625: SCSI error occurred on Path c11t282018CF24DEECFFd40s2(246/1194): opcode=0xa3 reported transport failure (status=0x0, key=0x0, asc=0x0, ascq=0x0)
Wed Nov 10 17:27:00.625: Disabled Path c11t282018CF24DEECFFd40s2(246/1192) belonging to Dmpnode HW_DIRAYA_INDICES07_D01(21/208) due to path failure
Cause
By design, DMP only disables a controller if all subpaths belong to that controller are disabled.
# vxdmpadm listenclosure all<snippet>
huawei-xsg10 HUAWEI-XSG1 210018cf24deecff CONNECTED ALUA 47 6000
<snippet>
The HUAWEI-XSG1 array has 47 dmpnodes and each dmpnode has 4 subpaths, 2 paths belonging to controller c11.
Device: HW_D01
numpaths: 4
c11t282018CF24DEECFFd40s2 state=enabled type=active/optimized(p)
c11t290018CF24DEECFFd40s2 state=enabled type=active/optimized(p)
c8t29A118CF24DEECFFd40s2 state=enabled type=active/optimized(p)
c8t28C118CF24DEECFFd40s2 state=enabled type=active/optimized(p)
Only the subpaths belonging to controller c11 ((47 * 2 = 94 subpaths) will be disabled.
Once all the controller c11 DMP paths are disabled, only then will DMP disable the controller "c11".
/etc/vx/dmpevents.log:Wed Nov 10 17:27:00.625: Disabled Path c11t290018CF24DEECFFd40s2(246/488) belonging to Dmpnode HW_D01(21/208) due to path failure
Wed Nov 10 17:27:00.625: Disabled Path c11t282018CF24DEECFFd40s2(246/1192) belonging to Dmpnode HW_D01(21/208) due to path failure
... ...
Wed Nov 10 17:30:03.603: Disabled Path c11t290018CF24DEECFFd31s2(246/560) belonging to Dmpnode HW_D01(21/384) due to path failure
Wed Nov 10 17:30:03.603: Disabled Path c11t282018CF24DEECFFd31s2(246/1336) belonging to Dmpnode HW_D01(21/384) due to path failure
Timeline Summary:
- The last subpath was disabled at "Wed Nov 10 17:30:03.603".
- Only once the last subpath has been disabled, will DMP actually disable the c11 controller.
Wed Nov 10 17:30:03.603: Disabled Controller c11 belonging to Disk array huawei-xsg10
ENABLED EVENT EXAMPLE:
In the below example, DMP is taking over 2 minutes to react to the QLOGIC "ONLINE" event.
/var/adm/messages:Nov 10 17:31:16 fred qlc: [ID 628150 kern.notice] NOTICE: Qlogic qlc(3,0,1): Link ONLINE
Nov 10 17:31:16 fred fp: [ID 517869 kern.warning] WARNING: fp(22): N_x Port with D_ID=ac900, PWWN=290018cf24deecff reappeared in fabric
Nov 10 17:31:16 fred fp: [ID 517869 kern.warning] WARNING: fp(22): N_x Port with D_ID=ac300, PWWN=282018cf24deecff reappeared in fabric
Nov 10 17:31:16 fred fp: [ID 517869 kern.warning] WARNING: fp(22): N_x Port with D_ID=ac600, PWWN=5000097408402da9 reappeared in fabric
Nov 10 17:31:16 fred fp: [ID 517869 kern.warning] WARNING: fp(22): N_x Port with D_ID=ae400, PWWN=5000097408402da1 reappeared in fabric
Nov 10 17:31:16 fred scsi: [ID 583741 kern.notice] Target 0xae400 LUN 0x0: Nonzero peripheral qualifier: Device type=0x1f Peripheral qual=0x1
Nov 10 17:31:16 fred scsi: [ID 583741 kern.notice] Target 0xac600 LUN 0x0: Nonzero peripheral qualifier: Device type=0x1f Peripheral qual=0x1
.
.
/etc/vx/dmpevents.log:Nov 10 17:33:26 ma8a1c000 vxdmp: [ID 144536 kern.notice] NOTICE: VxVM vxdmp V-5-0-0 [Info] enabled controller /pci@319/pci@1/SUNW,qlc@0,1/fp@0,0 connected to disk array 210018cf24deecff
Solution
Dynamic Multi-Pathing (DMP) maintains a kernel task that re-examines the condition of paths at a specified interval. The type of analysis that is performed on the paths depends on the checking policy that is configured.
The default dmp_restore_policy is check_disabled and will check every 300 seconds.
Refer to article "100051995" for more details surrounding the below DMP tunables:
1.] The path restoration thread analyzes all paths in the system and revives the paths that are back online, as well as disabling the paths that are inaccessible. The command to configure this policy is:
# vxdmpadm settune dmp_restore_policy = check_all
2.] The dmp_restore_interval tunable parameter specifies how often the path restoration thread examines the path
# vxdmpadm settune dmp_restore_interval = 150