Unresponsive system (hang) or possible data loss due to an adverse interoperability issue between Qlogic 2GB and 4GB HBAs and Veritas Volume Manager versions: 4.1 MP1 with 122059-02, 4.1 MP2, 5.0, or 5.0 MP1

Article: 100029632
Last Published: 2021-12-14
Ratings: 1 0
Product(s): InfoScale & Storage Foundation

Problem

There are instances when system hang or possible data loss is reported due to an adverse interoperability issue between Qlogic 2GB and 4GB HBAs and Veritas Volume Manager versions: 4.1 MP1 with 122059-02, 4.1 MP2, 5.0

 

Error Message

qlc: [ID 262021 kern.warning] WARNING: qlc(0): isr, Internal Parity/Pause Error - hccr=0h, stat=428113h, count=710644882

-OR-

WARNING: /pci@3,700000/SUNW,qlc@0,1/fp@0,0/ssd@w50060e800327572c,5a (ssd138): undecodable sense information: 0x0 0x0 0x0 0x1 0x0 0x0 0x0 0x2 0xff 0xff 0xff 0xff 0x13 0x7c 0

 

Solution

This issue only applies if the environment consists of:
Qlogic 2G or 4G host bus adapters (HBAs) on Solaris 8, 9, or 10 with one of the following releases of Veritas Volume Manager (VxVM):
Veritas Volume Manager 4.1 MP1 with patch 122059-02
Veritas Volume Manager 4.1 MP2 or later
Veritas Volume Manager 5.x
 
 
Detailed Description:
Due to an issue in DMP Fast Recovery procedures,  interaction between Qlogic 2G and 4G HBAs and VxVM may cause Solaris systems to become unresponsive (hang) under heavy load conditions during dynamic multipathing (DMP) Fast Recovery IO failure analysis.  
 
DMP Fast Recovery was introduced in the 5.0 release and back-ported to the 4.1 release through Patch 122059-02 as well as the Maintenance Patch (MP2) patchset (117080-07).
 
DMP Fast Recovery functionality greatly enhances IO failure analysis by communicating directly with the HBA driver, bypassing the SCSI disk (SD) driver which handles normal IO traffic.  By communicating directly with the HBA, failure analysis can be conducted much more efficiently without suffering through backlogged SD driver queues that typically accompany IO path failures during heavy load.
Incident e1123248 documents two defects:
  • Incorrect Command Descriptor Block (CDB) tagging.
  • Failure to reset b_resid (number of bytes not transferred) back to zero upon subsequent attempts to resubmit a given IO.
The resulting behavior is HBA driver specific.  Only Qlogic 2G or 4G HBAs have been found to exhibit this adverse behavior.
 
Resolution for 4.1 MP (x):
A binary hot fix is available for 4.1 MP2 to fix this issue.  The 4.1 MP2 patch (117080-07) is a prerequisite for the binary hot fix.
 
Patch (128045-01) is available for VxVM 4.1 MP2 RP2.  The 4.1 MP2 (117080-07) plus RP2 (124358-04) patches are prerequisites for this patch solution.
 
 
Availability of the DMP_Fast_Recovery tunable on 4.1 MP2:
 
"The dmp_fast_recovery tunable controls whether DMP should attempt to obtain SCSI error information directly from the HBA interface. Setting the value to on can potentially provide faster error recovery, provided that the HBA interface supports the error enquiry feature. If set to off, the HBA interface is not used. The default setting is off. Before enabling this tunable, make sure the HBA firmware level is supported in the HCL. Enabling this tunable with unsupported HBA firmware levels may result in a system panic."
 
There are three discrepancies in that quote from the MP2 Release Notes:
  • While the DMP Fast Recovery feature was included in 4.1 MP2, the dmp_fast_recovery tunable was not exposed.
  • DMP Fast Recovery is "on" by default.
  • The last two sentences referring to HBA firmware and risk of a system panic actually apply to the 'monitor_fabric' tunable. This tunable is 'off' by default in 4.1 MP2,  specifically to protect users against those risks.
In addition to repairing the two defects outlined above, Incident e1123248 also exposes the dmp_fast_recovery tunable as documented in the 4.1 MP2 Release Notes. The 5.0 release does include this tunable by default. The default value for this tunable remains "on" for both releases.


Workaround for 5.x:

1. Install VxVM 5.0 MP1:

2. Set dmp_fast_recovery=off:

  root# vxdmpadm gettune all |grep fast_recovery
  dmp_fast_recovery              on               on

  root# vxdmpadm settune dmp_fast_recovery=off
 
Tunable value will be changed immediately

  root# vxdmpadm gettune all |grep fast_recovery
  dmp_fast_recovery             off               on
 
  root# cat /etc/vx/dmppolicy.info
  arraytype
  #
  arrayname
  #
  enclosure
  #
  Tunables
  dmp_fast_recovery=off
  #
  root#
 

 

Was this content helpful?