Please enter search query.
Search <product_name> all support & community content...
Unresponsive system (hang) or possible data loss due to an adverse interoperability issue between Qlogic 2GB and 4GB HBAs and Veritas Volume Manager versions: 4.1 MP1 with 122059-02, 4.1 MP2, 5.0, or 5.0 MP1
Article: 100029632
Last Published: 2021-12-14
Ratings: 1 0
Product(s): InfoScale & Storage Foundation
Problem
There are instances when system hang or possible data loss is reported due to an adverse interoperability issue between Qlogic 2GB and 4GB HBAs and Veritas Volume Manager versions: 4.1 MP1 with 122059-02, 4.1 MP2, 5.0Error Message
qlc: [ID 262021 kern.warning] WARNING: qlc(0): isr, Internal Parity/Pause Error - hccr=0h, stat=428113h, count=710644882
-OR-
WARNING: /pci@3,700000/SUNW,qlc@0,1/fp@0,0/ssd@w50060e800327572c,5a (ssd138): undecodable sense information: 0x0 0x0 0x0 0x1 0x0 0x0 0x0 0x2 0xff 0xff 0xff 0xff 0x13 0x7c 0
Solution
This issue
only applies if the environment consists of:
Qlogic 2G or 4G host bus adapters (HBAs) on Solaris 8, 9, or 10 with one of the following releases of Veritas Volume Manager (VxVM):
Veritas Volume Manager 4.1 MP1 with patch 122059-02
Veritas Volume Manager 4.1 MP2 or later
Veritas Volume Manager 5.x
Detailed Description:
Due to an issue in DMP Fast Recovery procedures, interaction between Qlogic 2G and 4G HBAs and VxVM may cause Solaris systems to become unresponsive (hang) under heavy load conditions during dynamic multipathing (DMP) Fast Recovery IO failure analysis.
DMP Fast Recovery was introduced in the 5.0 release and back-ported to the 4.1 release through Patch 122059-02 as well as the Maintenance Patch (MP2) patchset (117080-07).
DMP Fast Recovery functionality greatly enhances IO failure analysis by communicating directly with the HBA driver, bypassing the SCSI disk (SD) driver which handles normal IO traffic. By communicating directly with the HBA, failure analysis can be conducted much more efficiently without suffering through backlogged SD driver queues that typically accompany IO path failures during heavy load.
Incident e1123248 documents two defects:
- Incorrect Command Descriptor Block (CDB) tagging.
- Failure to reset b_resid (number of bytes not transferred) back to zero upon subsequent attempts to resubmit a given IO.
The resulting behavior is HBA driver specific. Only Qlogic 2G or 4G HBAs have been found to exhibit this adverse behavior.
Resolution for 4.1 MP (x):
A binary hot fix is available for 4.1 MP2 to fix this issue. The 4.1 MP2 patch (117080-07) is a prerequisite for the binary hot fix.
Patch (128045-01) is available for VxVM 4.1 MP2 RP2. The 4.1 MP2 (117080-07) plus RP2 (124358-04) patches are prerequisites for this patch solution.
Availability of the DMP_Fast_Recovery tunable on 4.1 MP2:
"The dmp_fast_recovery tunable controls whether DMP should attempt to obtain SCSI error information directly from the HBA interface. Setting the value to on can potentially provide faster error recovery, provided that the HBA interface supports the error enquiry feature. If set to off, the HBA interface is not used. The default setting is off. Before enabling this tunable, make sure the HBA firmware level is supported in the HCL. Enabling this tunable with unsupported HBA firmware levels may result in a system panic."
There are three discrepancies in that quote from the MP2 Release Notes:
- While the DMP Fast Recovery feature was included in 4.1 MP2, the dmp_fast_recovery tunable was not exposed.
- DMP Fast Recovery is "on" by default.
- The last two sentences referring to HBA firmware and risk of a system panic actually apply to the 'monitor_fabric' tunable. This tunable is 'off' by default in 4.1 MP2, specifically to protect users against those risks.
In addition to repairing the two defects outlined above, Incident e1123248 also exposes the dmp_fast_recovery tunable as documented in the 4.1 MP2 Release Notes. The 5.0 release does include this tunable by default. The default value for this tunable remains "on" for both releases.
Workaround for 5.x:
1. Install VxVM 5.0 MP1:
2. Set dmp_fast_recovery=off:
root# vxdmpadm gettune all |grep fast_recovery
dmp_fast_recovery on on
root# vxdmpadm settune dmp_fast_recovery=off
Tunable value will be changed immediately
root# vxdmpadm gettune all |grep fast_recovery
dmp_fast_recovery off on
root# cat /etc/vx/dmppolicy.info
arraytype
#
arrayname
#
enclosure
#
Tunables
dmp_fast_recovery=off
#
root#