Setting the DMP timebound iotimeout too low in 5.1SP1 and later can cause i/o's to error and not undergo retry attempts on other available paths

Article: 100007278
Last Published: 2023-09-20
Ratings: 0 0
Product(s): InfoScale & Storage Foundation

Problem

When using the "timebound" recovery option with Dynamic Multi-Pathing (DMP), an iotimeout value is used to limit the time an i/o is tried and prevent application hangs from unresponsive disk devices.

In 5.1SP1 (and later versions), the iotimeout value limits the time that DMP waits for an IO to return failed when a IO error is being retried by the SCSI driver. Consequently setting this value too low can result in i/o's being failed and not being retried on alternative paths.  The default value of Timebound iotimeout in 5.1SP1 is 300 seconds.  Under some conditions, described in the Environment section below, you may need to set a value larger than 300.  The alternative is to use the Fixed Retry recovery option.

Background:

When an i/o is sent to the DMP component, DMP selects a path to service the i/o and sends the i/o the the disk driver to complete the request. If the i/o fails, the disk driver will return the failed i/o to DMP for error analysis. As part of this analysis, DMP will test the problem path by sending a SCSI inquiry probe. If the probe is successful, DMP will retry the i/o on the same path.

Should a scenario arise in that i/o's fail to a path but the test probes succeed, DMP will retry i/o on the same path until dmp_retry_count (default 5) has been exceeded, before retrying on an alternative path. On each retry attempt, DMP will check if the original i/o (being retried) has exceeded the iotimeout. If the iotimeout has been exceeded, then no further retries will be attempted and the i/o will be returned to the upper level as failed.

The above does not apply if the fixedretry recovery option is used. Previously in 5.0MP3, there were recommendations to tune the value low so that alternative paths may be used quicker as the mechanism was different.

 

How to display the current iotimeout value:

     vxdmpadm getattr enclosure <enclosurename> recoveryoption        

In the example below, the Disk enclosure is using Timebound Error-retry Logic and an iotimeout of 30 is used. Note this is less than most SCSI driver total timeout periods with retries at the SCSI layer.

#vxdmpadm getattr enclosure disk recoveryoption
ENCLR-NAME      RECOVERY-OPTION      DEFAULT[VAL]  CURRENT[VAL]
===============================================================
disk           Throttle             Nothrottle[0]  Nothrottle[0]
disk           Error-Retry          Timebound[300] Timebound[30]  <-- iotimeout set for 30 seconds. 
       

Another example below is for all attributes of the enclosure:

#vxdmpadm getattr enclosure emc0 
ENCLR_NAME      ATTR_NAME                     DEFAULT        CURRENT
============================================================================
emc0           iopolicy                      MinimumQ       MinimumQ
emc0           partitionsize                 512            512
emc0           use_all_paths                 -              -
emc0           failover_policy               Global         Global
emc0           recoveryoption[throttle]      Nothrottle[0]  Nothrottle[0]
emc0           recoveryoption[errorretry]    Timebound[300] Timebound[300]   <-- iotimeout at 300 seconds
emc0           redundancy                    0              0
emc0           dmp_lun_retry_timeout         0              0
emc0           failovermode                  -              -
 

For this example, we increase the value to 315 seconds:

#vxdmpadm setattr enclosure emc0 recoveryoption=timebound iotimeout=315

Error Message

NOTICE: VxVM vxdmp V-5-3-0 Reached DMP Threshold IO TimeOut (300) for disk 49/0x32

 

Solution

Ensure the iotimeout value is high enough for Disk drivers to retry the i/o (and return fatal error) before timeout of the DMP node.  The value of iotimeout should be somewhat less than the timeout value for applications ( Database instances, etc. ) to prevent application hang or failure.

 

Applies To

5.1SP1 and later verions for Solaris (Also in 6.x)

5.1SP1 and later versions for RedHat 5.x (due to a udev rule that sets SCSI_timeout to 60 seconds in RHEL5.x. This rule does not exist in RHEL4.x or RHEL6.x)

The typical scenarios that could cause SCSI to take the maximum timeout limits to time out are:

SAN fabric failure where the host does not lose local port connection. Loss of port connection usually results in immediate FATAL SCSI error.

No Device type failures where the target simply stops responding to any command. One possible cause is incorrect fabric zoning.

 

SPECIAL Note:

Solaris when using sd driver and sd_io_time set to 60 seconds or longer. Note that 60 seconds is the default value,  however is commonly set in /etc/system as required by array vendors. The sd driver retries 5 times, yielding a path timeout of 300 seconds.  [ sd_io_time =60 sec X sd_retry_count=5 = 300 second timer ] In this case, the DMP iotimeout value should be set to at least 315 or more to allow proper path failover when a device becomes unresponsive (SAN fabric failure, etc.)

In the case of Solaris 10 using the embedded ssd driver the timing of FATAL differs. The ssd  driver retries only 3 times and adds a 20 second FCP timeout yielding a path timeout of 200 seconds with the same ssd_io_time value.

 RedHat Enterprise Linux 5.x (RHEL5.x)

 From the kernel source we can see at the scsi layer the timeout is set to 30 seconds by default.  We can also see that the retries is hard coded into the kernel
 


drivers/scsi/sd.c
-----------------------------
/*
* Time out in seconds for disks and Magneto-opticals (which are slower).
*/
#define SD_TIMEOUT              (30 * HZ)
#define SD_MOD_TIMEOUT          (75 * HZ)
#define SD_FLUSH_TIMEOUT        (60 * HZ)

/*
* Number of allowed retries
*/
#define SD_MAX_RETRIES          5
#define SD_PASSTHROUGH_RETRIES  1
-----------------------------
 
     


This code is the same in RHEL6.  The timeout in RHEL5 is adjusted by the following udev rule that is installed by default.

       
/etc/udev/rules.d/50-udev.rules
-----------------------------
# sd:           0 TYPE_DISK, 7 TYPE_MOD, 14 TYPE_RBC
# sr:           4 TYPE_WORM, 5 TYPE_ROM
# st/osst:      1 TYPE_TAPE
# sg:           8 changer, [36] scanner
ACTION=="add", SUBSYSTEM=="scsi" , SYSFS{TYPE.EN_US}=="0|7|14", \
        RUN+="/bin/sh -c 'echo 60 > /sys$$DEVPATH/timeout'"

ACTION=="add", SUBSYSTEM=="scsi" , SYSFS{TYPE.EN_US}=="1", \
        RUN+="/bin/sh -c 'echo 900 > /sys$$DEVPATH/timeout'"
-----------------------------

 
 

This is only set in RHEL5.x Checking RHEL4 and RHEL6, we can see that the timeout is 30 as is hard coded into the kernel.

The following document confirms these findings.

    17. Controlling the SCSI Command Timer and Device Status
          https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux/5/html/online_storage_reconfiguration_guide/task_controlling-scsi-command-timer-onlining-devices

Was this content helpful?