How to reduce the time taken by Veritas Dynamic Multi-pathing (DMP) to react to disabled and enabled controller related events

Article: 100051995
Last Published: 2022-01-11
Ratings: 9 0
Product(s): InfoScale & Storage Foundation

Description

 

The following article outlines how to reduce the time taken by Veritas Dynamic Multi-pathing (DMP) to react to disabled and enabled controller related events.


Dynamic Multi-Pathing (DMP) maintains a kernel task that re-examines the condition of paths at a specified interval. The type of analysis that is performed on the paths depends on the checking policy that is configured.


The default dmp_restore_policy is check_disabled and will check every 300 seconds.


Oracle (Solaris) and DMP be tuned to reduce the time window to safely disable paths associated with a disabled switch port event. Whilst ensuring the in-flight I/O’s are processed correctly.


1.] The path restoration thread analyzes all paths in the system and revives the paths that are back online, as well as disabling the paths that are inaccessible. The command to configure this policy is:
 

# vxdmpadm settune dmp_restore_policy = check_all


2.] The dmp_restore_interval tunable parameter specifies how often the path restoration thread examines the paths
 

# vxdmpadm settune dmp_restore_interval = 150

 

Where alternate (different) controller paths are available and are enabled, failed in-flight I/O’s will be retired down a different controller path.


Solaris (Oracle) Setting:
 

To reduce the file spent beneath DMP by the sd and ssd drivers, the sd_io_time and ssd_io_time can be set to 0x1E (30 Seconds)
 

Update the /etc/system file with the new values and reboot the server for the values to take affect:

set sd: sd_io_time = 0x1E # 30sec
set ssd: ssd_io_time = 0x1E # 30sec

 

 

With the above settings in place, users can expect DMP to be quicker in reacting to disabled and enabled events as follows:

 

Disable path:
17:47:06 - Switch Port Disabled 
17:49:44 - Disabled all paths for enabled controller
 

Enable path:
18:03:03 - Switch Port Enabled
18:05:33 - DMP Enabled all paths for disabled controller

 

 

Why does it take Veritas Dynamic Multi-pathing (DMP) time to react to disabled and enabled controller related events?

Refer to article (100051997) for additional technical content.

 

ADDITIONAL TESTS

----------------------------------------------------------------------------------------------------------------------------------------------------------

A series of tests were also conducted to illustrate the difference in timings, when different DMP tunables and Solaris settings were deployed:


## Test 1 - Attempt 1:
 

Settings:

- vxdmpadm setattr enclosure huawei-xsg10 recoveryoption=fixedretry retrycount=1
- set ssd:ssd_io_time=0x3C
- vxdmpadm settune dmp_restore_interval=300


Timings:

15:41:10 Switch port is disabled
15:41:16 Link offline message received from the OS
15:41:39 DMP path disable message received 


15:42:23 Switch port is enabled
15:42:29 Link online message received from the OS
15:44:10 DMP path enable message for all paths
 

Outcome: Acceptable time
 

## Test 1 - Attempt 2:

15:44:47 Switch port disabled
15:44:53 Link offline message from OS
15:45:12 DMP path disable message received for SOME of the paths
15:47:25 DMP path disable message received for the remaining paths

15:47:51 Switch port enabled
15:47:57 Link online message from OS
15:50:00 DMP path enable message for all paths
 

Outcome Acceptable time, however, delay in all paths being disabled

----


## Test 2 - Attempt 1:

Configuration:

- vxdmpadm setattr enclosure huawei-xsg10 recoveryoption=fixedretry retrycount=1
- vxdmpadm settune dmp_restore_interval=300
- set ssd:ssd_io_time=0x1E

NOTE: Since the value of "ssd_io_time" was changed, the server was restarted 
 

15:54:51 Switch port disabed
15:54:58 Link offline message from OS


After 15 mins waiting, we force the rescan and that is when DMP disabled the paths

16:11:47 Switch port enabled
16:11:53 Message Link online from OS 
16:12:34 DMP path enabled message for all paths
 

Outcome: The paths were never disabled

 

## Test 2 - Attempt 2:


16:13:30 The switch port is disabled
16:13:36 Message Link offline of Operating System
16:19:23 DMP disable message for all paths


16:20:09 The switch port is enabled
16:20:15 Message Link online of Operating System
16:21:54 DMP enable message for all paths
 

Outcome: Approximately 6 min to detect disabled paths

----

## Test 3 - Attempt 1:

Setting:

- vxdmpadm setattr enclosure huawei-xsg10 recoveryoption = fixedretry retrycount = 1
- set ssd: ssd_io_time = 0x1E
- vxdmpadm settune dmp_restore_interval = 150


16:23:12 The switch port is disabled
16:23:18 Operating System offline link message
16:24:25 DMP disable message from paths


16:25:05 The switch port is enabled
16:25:11 Online Operating System Link Message
16:26:56 DMP enable message from all paths


Outcome: Quick response times

 

## Test 3 - Attempt 2:


16:27:20 The switch port is disabled
16:27:26 Operating System offline link message
16:30:11 DMP disable message for some paths
16:32:41 DMP disable message from the rest of the paths


16:33:07 The switch port is enabled
16:33:13 Online Operating System Link Message
16:35:12 DMP enable message from all paths

Outcome: More than 5 min to detect all disabled paths

----

## Test 4 - Attempt 1:

Setting:

- vxdmpadm setattr enclosure huawei-xsg10 recoveryoption = timebound iotimeout = 300
- set ssd: ssd_io_time = 0x1E
- vxdmpadm settune dmp_restore_interval = 150


18:42:43 The switch port is disabled
18:42:49 Operating System offline link message
18:43:52 DMP disable message for some paths

After a while waiting for no change, "vxdisk scandisks" ran to detect remaining disabled disabled paths.
 

Outcome: More than 5 min to detect all disabled paths
 

 

Recommendations:


In light of the above inconsistent test results, Veritas recommends the following tunable values in an effort to examine all path states and reduce the time interval to examine path states.


To ensure the path restoration thread analyzes all paths in the system and revives the paths that are back online, as well as disabling the paths that are inaccessible set the dmp_restore_policy  to "check_all".
 

Use:
 

# vxdmpadm settune dmp_restore_policy = check_all

# vxdmpadm settune dmp_restore_interval = 150


As the dmp_restore_interval tunable specifies how often the path restoration thread examines the paths presented to the host. If the system has more than 500 LUNs with multiple paths, the "dmp_restore_interval" setting may need to be increased to a higher value (either above 150 or back to default value of 300).

 

Was this content helpful?