In certain circumstances, Windows Event Viewer may display errors regarding read / write issues with disks provided by Storage Foundation for Windows. Although the Windows Event Viewer lists "vxio" as the source of these events, vxio is actually not the origin of the errors.
During a read or write event, vxio sends the I/O request to the disk device drivers and waits for the outcome. If the outcome is an error, the error will be reported to vxio by the disk device drivers, and the event will be listed in the Windows Event Viewer with "vxio" as the source. However, vxio is not the original source of these messages.
The ability of Symantec to analyze these errors is limited since the origin of the messages lies outside the scope of Storage Foundation. However, to assist with determining the cause of these errors, common troubleshooting recommendations and best practices have been gathered below.
When Storage Foundation for Windows (SFW) is installed, the Microsoft Logical Disk Manager (LDM) drivers are replaced by Veritas Volume Manager (VxVM) equivalents. This means that messages that would normally be reported by dmio are now reported by vxio. Read or write errors reported by vxio can be interpreted as if they were being reported by the standard Microsoft dmio driver. The errors, along as their causes and solutions, are usually identical regardless of the driver present. In most cases, vxio and dmio messages originate from lower level device drivers such as disk.sys, storport.sys, scsiport.sys or the host bus adapter (HBA) driver (Figure 1). These typically appear in response to hardware events or kernel memory resource shortages and do not indicate a problem with vxio or Storage Foundation.
Review the driver status returns
Some read or write errors will include a status that contains a hex code. This code is called a "driver status return" and can assist in determining the cause of the errors. These codes are not specific to Storage Foundation and will occur whether or not Storage Foundation is present. They can often be translated using an internet search engine.
Common driver status returns include:
0x80000010 STATUS_DEVICE_OFF_LINE - This status may also indicate a disk outage; however, it is common for this code to be reported during cluster service group failovers. In many cases, a cluster node will attempt to online or offline a disk that is no longer available to this node. If this status is only reported during cluster service group failovers, it does not indicate a problem.
0x80000011 STATUS_DEVICE_BUSY - This status indicates that the disk is already reserved by another host. As with 0x80000010, it is common for this code to be reported during cluster service group failovers. If this status is only reported during cluster service group failovers, it does not indicate a problem. If this status appears when a cluster failover is not progress, it may indicate a SCSI reservation conflict. Refer to related documents for more information.
0xc000009a STATUS_INSUFFICIENT_RESOURCES - The status indicates that the amount of memory available to the kernel was insufficient to process the read or write request. Note that the amount of memory available to drivers is limited, even if the server contains a large amount of physical memory. Possible causes for this status are discussed in Microsoft Knowledge Base Article ID 329075.
0xc000009d STATUS_DEVICE_NOT_CONNECTED - This status indicates that the disk is no longer being presented to the host. This usually appears in response to a hardware event.
Search for events generated by the Disk.sys driver in the Windows Event Viewer
Events reported by vxio are frequently accompanied by events from "disk," which is the source for messages from the disk.sys driver. This driver operates at a lower layer than vxio.sys and may report messages that are closer to the source of the problem. Errors from disk can easily be found by selecting Filter from the Windows Event Viewer (Figure 2) and choosing "disk" from the drop-down list (Figure 3).
Review logs for the HBAs, switches and disk arrays
It is recommended that the logs for all SAN components be reviewed. In many cases, one of the components in the Storage Area Network (SAN) may not detect the conditions that led to the errors reported in the Windows Event Viewer; however, other components may have logged relevant events during the same time frame. For this reason, checking multiple components is recommended. This also has the advantage of viewing events from different vantage points. If the problem can be reproduced, the trace levels for the logs can often be increased for greater detail. In some cases, attaching a fiber analyzer to the SAN can be helpful for revealing activity that would otherwise be undetected.
Verify that the hardware configuration matches the Hardware Compatibility List (HCL)
The HCL contains a list of supported drivers and firmware for HBAs, disk arrays, and switches that are compatible with Storage Foundation. When troubleshooting disk errors, verifying that the hardware configuration matches the HCL is strongly recommended,
even if this configuration was working before the errors appeared. The HCL also contains numerous footnotes regarding issues that are specific to certain hardware and configurations. For compatibility information, review the HCL found in the "Compatibility & Reference" section of the Support Web site for Storage Foundation for Windows http://www.symantec.com/docs/TECH148533
Review the zone configuration
For World-Wide Name (WWN) zones, a generally accepted best practice is to include only one target and one initiator into each zone (in general, one zone per I/O path). This means that no HBA should be in the same zone as any other HBA and no disk array controller should be in the same zone as any other disk array controller (Figure 4 and Figure 5). The benefit of this configuration is that events occurring in one zone will not affect components that reside in other zones. Symantec Technical Support has observed many cases where this zoning configuration eliminated intermittent read and write issues.
Figure 5 - Legend
Note: The diagram is intended to illustrate a concept. It is not a specific recommendation from Symantec.
Further information on troubleshooting disk read and write errors can be found in Microsoft Knowledge Base Articles ID 816004, 329075, 293842, 272569, and 321733.