Veritas Dynamic Multi-Pathing (DMP) fails to successfully failover I/O during array activity when using EMC NDM and NDU

Article: 100045757
Last Published: 2024-06-12
Ratings: 10 0
Product(s): InfoScale & Storage Foundation

Problem


When performing array maintenance (updates) in some rare cases, Veritas DMP (Dynamic Multi-pathing) may fail to successfully failover I/O to alternate active paths, resulting in unintended Veritas file systems faults (outages).

This could be a result of many external factors.

EMC has provided recommendations to aid with such interoperability issues.
 

Error Message

 

Cause

 

NDM (Non-Disruptive Migration) overview


NDM is designed to help automate the process of migrating host applications to a PowerMax, VMAX All-Flash, or VMAX3 enterprise storage array with no downtime.


NDM leverages VMAX SRDF replication technologies to move the application data to the new storage array.  


During this activity window, DMP may unintentionally fault a DMPNODE resulting in an unwanted file system fault (outage).
 

EMC are recommending the following DMP tunable parameters be amended when performing EMC NDM migrations:

1. dmp_path_age 0

2. dmp_health_time 0 

3. dmp_restore_interval 10

 

During the NDM activity, EMC doesn't want Veritas DMP to send any test INQUIRIES to the volumes (LUNs) where paths are marked as "dead" (disabled).  


By modifying the above DMP tunables, this allows DMP to avoid such checks thus potentially avoiding any unwanted disabling of DMPNODEs.

This solution has been tested and qualified by both Elab and Symmetrix engineering and is outlined in the following best practice document:

https://www.delltechnologies.com/asset/en-us/products/storage/technical-support/h17133-non-disruptive-migration-best-practices-and-operational-guide.pdf

NOTE: At the time of writing this article, the details are outlined on Page 168.

 

DMP parameters:

 

Parameter

Description

dmp_health_time

DMP detects intermittently failing paths, and prevents I/O requests from being sent on them. The value of dmp_health_time represents the time in seconds for which a path must stay healthy.

If a path's state changes back from enabled to disabled within this time period, DMP marks the path as intermittently failing, and does not re-enable the path for I/O until dmp_path_age seconds elapse.

The default value is 60 seconds.

A value of 0 prevents DMP from detecting intermittently failing paths.

dmp_path_age

The time for which an intermittently failing path needs to be monitored as healthy before DMP again tries to schedule I/O requests on it.

The default value is 300 seconds.

A value of 0 prevents DMP from detecting intermittently failing paths.

dmp_restore_interval

The interval attribute specifies how often the path restoration thread examines the paths. Specify the time in seconds.

The default value is 300.

The value of this tunable can also be set using the vxdmpadm start restore command.

See Configuring DMP path restoration policies.

 

Solution

 

The DMP parameter values can be modified using the following syntax:

# vxdmpadm settune dmp_tunable=value 

# vxdmpadm gettune [dmp_tunable]

The tunables should limit inquires being executed on the path when the migration or NDU activities are happening.

 

# vxdmpadm settune dmp_health_time=0

# vxdmpadm settune dmp_path_age=0

# vxdmpadm settune dmp_restore_interval=10
 

The only problem here would be even if a path were to fail with an IO error, Veritas would not then enable the path automatically. The Admin/User would have to enable it once the activity is completed.

 

The dmp_restore_interval impact is limited since it controls after how much time the disabled paths should be probed. Setting it to 10 would cause less inquiries to be triggered frequently on the disabled path.

 

To limit inquires being triggered on the path then the following command must be used:

 

# vxdmpadm stop restore


NOTE: Once the array activity window is complete, the DMP parameters should be set to back to the original values.



NDU (Non-disruptive upgrade) Activity
 

The same approach could be implemented for NDU (Non-disruptive upgrade) array activities.

A non-disruptive upgrade (NDU) is an update to software or hardware that does not interrupt access to data or system services.  An NDU does not require the system to be rebooted when the upgrade process is completed.


Summary of commands prior to starting array activity:
 

# vxdmpadm settune dmp_health_time=0

# vxdmpadm settune dmp_path_age=0

# vxdmpadm settune dmp_restore_interval=10

# vxdmpadm stop restore

 

Summary of commands post array activity:

 

# vxdmpadm settune dmp_health_time=60

# vxdmpadm settune dmp_path_age=300

# vxdmpadm settune dmp_restore_interval=300

# vxdmpadm start restore

 

Was this content helpful?