SCSI3 PGR registration fails on DMP node when using Brocade or Emulex FlexFabric HBA's

Article: 100005275
Modified: 2012-12-21
Ratings: 0 0
products: InfoScale & Storage Foundation

Problem

When Emulex FlexFabric and Brocade HBA's are used with disk array's that support SCSI-3 PGR it will fail to register keys on DMP devices. Registration on native disk paths is successful, but registration on a dmpnode fails.

Error Message

Sample of failing vxfentsthdw:

Read from disk /dev/vx/rdmp/xp10k-12k0_0010s3 on node server02 ......... Passed
Write to disk /dev/vx/rdmp/xp10k-12k0_0010s3 from node server02 ........ Passed
Reserve disk /dev/vx/rdmp/xp10k-12k0_0010s3 from node server01 ......... Passed
Verify reservation for disk /dev/vx/rdmp/xp10k-12k0_0010s3 on node server01  Passed
Read from disk /dev/vx/rdmp/xp10k-12k0_0010s3 on node server01 .........
Passed
Read from disk /dev/vx/rdmp/xp10k-12k0_0010s3 on node server02 ......... Passed
Write to disk /dev/vx/rdmp/xp10k-12k0_0010s3 from node server01 ........ Failed

 

Sample of vxdg init failure:

# vxdg -s init testdg xp10k-12k1_0010                                                                                                
VxVM vxdg ERROR V-5-1-585 Disk group testdg: cannot create: Disk write failure  

/etc/vx/dmpevents.log

Wed Aug 17 16:23:02.540: SCSI error occured on Path sde: opcode=0x5f reported bus abort (status=0x0, key=0x0, asc=0x0, ascq=0x0)
Wed Aug 17 16:23:03.540: SCSI error occured on Path sde: opcode=0x5f reported bus abort (status=0x0, key=0x0, asc=0x0, ascq=0x0)
Wed Aug 17 16:23:03.542: SCSI error occured on Path sda: opcode=0x5f reported bus abort (status=0x0, key=0x0, asc=0x0, ascq=0x0)
Wed Aug 17 16:23:04.542: SCSI error occured on Path sda: opcode=0x5f reported bus abort (status=0x0, key=0x0, asc=0x0, ascq=0x0)
 

Cause

If dmp debugging is enabled, the follow errors will be seen in /var/log/messages during operations like "vxdg -s init", and "vxfentsthdw".

 To enable dmp debugging:

# vxdmpadm settune dmp_loglevel=9

To return to normal debugging level:

# vxdmpadm settune dmp_loglevel=0

Sample of DMP Debug output:

Feb 23 11:53:34 server01 kernel: VxVM vxdmp V-5-3-0 dmp_check_scsipkt: SCSI request failure host_byte = 0x7 msg_byte = 0x0 rq_status = 0x3
Feb 23 11:53:34 server01 kernel:
Feb 23 11:53:34 server01 kernel: VxVM vxdmp V-5-0-0 SCSI error opcode=0x5f returned rq_status=0x3 cdb_status=0x0 key=0x0 asc=0x0 ascq=0x0 on path 8/0xf0
Feb 23 11:53:34 server01 kernel:
Feb 23 11:53:34 server01 kernel: VxVM vxdmp V-5-3-0 dmp_pr_send_cmd failed with transport error: uscsi_rqstatus = 3ret = -1 status = 0 on dev 8/0xf0
Feb 23 11:53:34 server01 kernel:
Feb 23 11:53:34 server01 kernel: VxVM vxdmp V-5-3-0 dmp_pr_send_cmd retrying: retry count = 10 uscsi_rqstatus = 3 ret = -1 status = 0 on dev 8/0xf0

#define DID_ERROR       0x07    /* Internal error                          */

The HBA is returning DID_Error to DMP, after PGR fencing key is failing to be applied to the subsequent DMP paths after the first path registration succeeds.

 

Solution

When dmp_fast_recovery is on, dmp sends scsi requests to HBA bypassing scsi interface. Dmp depends on the status sent by this HBA interface. With Emulex FlexFabric  and Brocade HBA's only we get “Bus Abort” from the HBA.

example,

Wed Aug 17 16:23:02.540: SCSI error occured on Path sde: opcode=0x5f reported bus abort (status=0x0, key=0x0, asc=0x0, ascq=0x0)

Bus abort is a retryable error but in this case it is not successful till dmp_pgr_retries (by default value is 10) gets exhausted.

As of now workaround is to disable dmp_fast_recovery allowing dmp to use standard scsi interface; thus not talk directly to the HBA interface.

# vxdmpadm settune dmp_fast_recovery=off
Tunable value will be changed immediately 
 

Note: An official fix has been verified on HP Lab systems. The fix included modification of the Emulex LPFC driver to use non-scatter gather list buffer. Customers looking for an official fix are requested to contact their HBA vendor. For HP customers using Emulex LPFC FlexFabric cards the following case numbers can be referenced.

Emulex case # CS041460 / HP Case ID: 4633892599


Applies To

Storage Foundation 5.1SP1RP1 and later

Confirm that Brocade adapter and Emulex FlexFabric HBA exist on the given systems:

# grep Brocade /var/log/messages
Feb 21 14:59:49 server01 kernel: scsi2 : Brocade FC Adapter, model: Brocade-425 hwpath: 0000:17:00.0 driver: 1.1.0.10

# lspci -vv | grep Brocade
0d:00.0 Fibre Channel: Brocade Communications Systems, Inc. 425 4Gb/825 8Gb PCIe Dual port FC HBA (rev 01)
 

# grep Emulex /var/log/messages

Aug 24 15:10:04 veritas1 kernel: Emulex LightPulse Fibre Channel SCSI driver 8.2.0.96

 

The problem is also affecting the following HBA model and driver.

08:00.2 Fibre Channel: Emulex Corporation OneConnect 10Gb FCoE Initiator (be3) (rev 01)

May 10 13:45:44 euhvcsp10 kernel: Emulex LightPulse Fibre Channel SCSI driver 8.2.0.106.1p

References

Etrack : 2517829 Etrack : 2519062

Was this content helpful?