Troubleshooting hardware with Backup Exec for Windows Servers using the SCSI Trace Utility (tracer.exe).

Article: 100007801
Last Published: 2021-02-09
Ratings: 0 0
Product(s): Backup Exec

 

 
Problem

Troubleshooting hardware with Backup Exec for Windows Servers using the SCSI Trace Utility (tracer.exe).

Solution

Introduction

Backup Exec for Windows Servers comes with a SCSI Trace utility.  This utility can be used to troubleshoot suspected tape hardware issues in the Backup Exec environment.
Tracer works by performing a low-level SCSI trace of the SCSI bus on a server and records the SCSI commands sent to and from all of the devices on the SCSI bus. The SCSI commands captured by tracer.exe are an industry standard for all SCSI devices.  The information gathered by the SCSI trace utility can be used to narrow down the cause of a particular problem and determine whether or not there is a hardware fault.  
 
Tracer.exe is located within the installation directory of the Backup Exec application (by default C:\Program Files\Symantec\Backup Exec\ or C:\Program Files\Veritas\Backup Exec). Make sure Tracer is set to use "choose prefer user mode capture to kernel mode" If not, then select the checkbox and restart Tracer.

 
To begin capturing SCSI data, launch tracer.exe and click on the green capture button (Figure 1):
  
Figure 1:
 
 
 
With tracer.exe running, perform the steps necessary to reproduce your error.
 
NOTE: Due to the inherit nature of SCSI and the high number of commands sent to and from SCSI-based devices, tracer logs can become quite large rather quickly. For that reason, it is best to try to reproduce the problem in as short an operation as possible to minimize the size of the SCSI trace.
 
Understanding how to read the events captured by tracer.exe
Tracer logs SCSI commands sequentially. Each event is logged containing the SCSI Target, SCSI command type, and SCSI driver result.
 
Example of a successful SCSI command (Figure 2):
 
Figure 2:
 
 

The above command is an Inquiry command, which responded with a good SCSI status and a successful driver result.

Example of an unsuccessful SCSI command (Figure 3):
Figure 3:
 

 
 

The above command is a Test Unit Ready command, which is a SCSI command that queries the device to see if the device is ready for read and write operations. In this case, there is a SCSI reservation conflict preventing such an operation. The SCSI Status also indicates that there is an IO Device Error. The cause of this conflict would on the hardware level.

SCSI commands

This is list of the most common SCSI commands used by Backup Exec when communicating with tape drives and libraries (Figure 4:):

Figure 4:
 

CDB COMMAND Description
00h TEST UNIT READY Queries device to see if it is ready for data transfers.
01h REWIND Rewinds the medium.
03h REQUEST SENSE Requests that the device transfer sense data to the host.
05h READ BLOCK LIMITS Reports the maximum block length limit.
07h INITIALIZE ELEMENT STATUS Forces an inventory operation.
08h READ Reads the medium.
0Ah WRITE Writes to the medium.
0Ch ROTATE MAILSLOT COMMAND Opens or closes the mailslot.
10h WRITE FILEMARKS Writes filesmarks, such as end of data, onto medium.
11h SPACE Provides a variety of positioning functions.
12h INQUIRY Returns basic device information and inquiry data.
15h MODE SELECT(6) Sets device parameters in a mode page.
16h RESERVE UNIT Reserves the unit.
17h RELEASE UNIT Releases the unit.
19h ERASE Erases the medium.
1Ah MODE SENSE(6) Returns current device parameters from mode pages.
1Bh LOAD UNLOAD Tells the target to load or unload the media in the tape cartridge.
1Eh PREVENT ALLOW MEDIUM REMOVAL Enables or disables the unloading of the tape cartridge.
2Bh LOCATE (Seek to a position) Uses the identifier from a READ POSITION to position back to this same logical position.
34h READ POSITION Read a position identifier, or SCSI Logical Block Address.
3Bh WRITE BUFFER Diagnostic function for testing the device data buffer, DMA engine, SCSI bus interface hardware, and SCSI bus integrity.
3Ch READ BUFFER Diagnostic function for testing the device data buffer, DMA engine, SCSI bus interface hardware, and SCSI bus integrity.
4Ch LOG SELECT Allows the host to manage statistical information maintained by the device about its own hardware or the installed media.
4Dh LOG SENSE Allows the initiator to modify and initialize parameters within the logs supported by the device.
55h MODE SELECT(10) Sets device parameters in a mode page.
5Ah MODE SENSE(10) Returns current device parameters from mode pages.
A5h MOVE MEDIUM Used to move cartridges from the tape drive to the library.
A6h EXCHANGE MEDIUM Used to move cartridges from the tape drive to the library.
B8h READ ELEMENT STATUS Returns the status tables of its elements to the initiator.
     


Errors and Check Conditions

Check conditions are errors that occur when a SCSI command completes successfully, but returns an error. Detailed information regarding the error is contained within the response from the SCSI within a field known as the Sense Data. These events are marked as a 'C #####' under the 'Check' column in tracer.
 
Error responses occur when SCSI commands do not complete their intended operation due to an error. Like Check Conditions, these errors normally contain additional sense key data that contains information regarding the condition that caused the failure. These errors are marked as an 'E' under the 'Check' column in tracer.  

 
Note: It is important to note that not all check conditions and SCSI errors occur due to bad or faulty hardware, and that some errors and check conditions will occur during normal tape operations. All hardware errors, however, will be reported as check conditions within tracer.
 

The following is an example of an expected check condition (Figure 5):

Figure 5:
 

The driver result from the above SCSI command was STATUS_IO_DEVICE_ERROR. The Additional Sense Code ( ASC) indicates that the drive is not ready due the condition MEDIUM_NOT_PRESENT, which means there is not a tape in the drive. While this is considered a SCSI 'error,' it does not constitute any fault with the hardware.

The following is an example of a check condition that occurred due to faulty hardware (Figure 6):

Figure 6:
 

 
The Test Unit Ready command responded with a Sense Key of UNIT_ATTENTION with the Additional Sense Code of POWER_ON_RESET_OR_BUS_DEVICE_RESET_OCCURRED.  This occurred due to the SCSI bus being reset due to a hardware failure before or during the Test Unit Ready command.
 


The following is an example of a SCSI command that resulted in an error (Figure 7):

Figure 7:
 

 
The Test Unit Ready command did not complete, and the status was STATUS_DEVICE_NOT_CONNECTED.  This occurred after a device was disconnected. Since the device was not connected, there was no Sense Data returned.
 


Filtering Tracer Events
Tracer can be filtered to show you only events from a certain device, a certain command, or commands that resulted in a check condition. To enable filters, go to Tools > Filters (Figure 8):

Figure 8:
 

For Example, to view all errors and check conditions, select the box 'Command resulted in check condition.' Furthermore, if you have multiple devices connected to your server, you can filter just the device you are troubleshooting by selecting 'Event is from one of these targets.'

 
Detection issues:
 
To use tracer.exe to troubleshoot a hardware detection issue, first have tracer display the SCSI discovery and verify that all of the required information is being presented to the operating system properly by clicking on Tools > Display Discovery Data (Figure 9):
 
Figure 9:
 
 
 
The above drive is an Archive Python 06408, connected on SCSI Port 6, SCSI BUS 0, SCSI ID 6, LUN 0, with Firmware Version 9100, and a serial number of HN0D594. All of this information should be present with a properly configured and operating drive.
 
The information should also be present in the DEVICEMAP within the SCSI registry. For the above example, that would be:
 
HKEY_LOCAL_MACHINE\HARDWARE\DEVICEMAP\Scsi\Scsi Port 6\Scsi Bus 0\Target Id 6\Logical Unit Id 0\
 
NOTE: You should not, under any circumstances, edit the registry settings under DEVICEMAP. These keys should be automatically populated if the hardware is configured and functioning properly.
 
If any of the above information is incomplete or missing, or if the device is shown multiple times, then perform a power cycle of the SCSI hardware and server, and then display the discover data again. If the data is still incomplete after performing those steps, consult your hardware documentation and verify that the hardware is connected properly to the server. If the connected properly, contact your hardware vendor for support.
 
If your devices are properly shown during Discovery but not appearing in the Backup Exec GUI, stop the Backup Exec services. Launch tracer.exe and begin capturing data, then start the Backup Exec services.
 
Whenever the Backup Exec services are started, Backup Exec will issue Inquiry, Reserve, Release, and Test Unit Ready commands to all of the SCSI hardware attached to the SCSI bus. These commands will respond successfully on properly functioning and configured hardware.
 
Read or Write Errors:
 
Read and write errors can be difficult to log with tracer unless the error can be easily reproduced in a short amount of time. Reason being is that a higher number of SCSI commands are issued whenever performing any basic function, especially when performing read or writes. The high number of commands will eventually cause tracer to run out of virtual memory, which can result in the tracer application hanging.
 
Note: In Backup Exec version 12.0 and higher, SGMON.EXE can be configured to capture tracer data. Please see related documents for using SGMON.EXE.
 
On a properly functioning SCSI drive, there should be very few check conditions and no errors reported on a Read or Write command. An error received on a Read or Write command is in almost all cases due to failing hardware or faulty media.
 
Example of a Read Error (Figure 10):
 
Figure 10:
 
 
 
Example of a Write Error (Figure 11):
 
Figure 11:
 
 
 
Some other errors that are indicative of Hardware failure are "Blank Check" errors, and "Mover error." 
If after performing a trace using tracer, you are unsure how to read the results, save the tracer file in a BIN format and then contact Veritas Technical Support.
 
Alternatively, you can export the tracer file as a text file and provide the data to your hardware vendor for further clarification.
 

 

 

 

Was this content helpful?