Troubleshooting missing disks, foreign disks and unknown disk groups (Unknown Dg) in Veritas InfoScale and Storage Foundation for Windows
When Veritas Enterprise Administrator (VEA) displays missing disks, it is possible that the disks are actually present, but they are being detected as Foreign or Basic disks instead.
VEA will display a Red X icon on a Dynamic Disk Group when a Missing Disk entry is present in that Disk Group. When highlighting that Dynamic Disk Group, there will be an entry that states Missing Disk (harddisk #). This entry is stating that there is a missing disk and giving the number of the harddisk, as provided from the O/S to Storage Foundation for Windows.
Note: That the harddisk number listed will NOT be the same one that is used by Storage Foundation for Windows (SFW) within VEA. Storage Foundation will enumerate disks differently when they are received from the O/S. This is why using the Replace Disk command (available when right clicking on the Missing Disk itself) is not always a viable option to resolve the situation, due to the fact that if multiple disks are listed as missing, it will be necessary to determine which 'Missing Disk (harddisk #)' correlates to which LUN properly.
Missing Disks are usually caused due to possible Private Region corruptions, mismatched Disk IDs, or other such causes. They can often be seen in company with Foreign Disks, which are typically (but not always) found in the 'Unknown Disk Group' or 'Unknown DG' within VEA. These Foreign Disks are usually caused by an issue where the disk itself is available, but the Private Region cannot properly identify what Dynamic Disk Group it belongs to, or an issue with properly reading the disk. Sections 3 and 4 below will discuss Foreign Disks in greater detail.
Finally, Missing Disks can in fact be caused by the actual removal of the physical disk, or zoning the LUN away from the server(s) without properly removing it from the Dynamic Disk Group.
Note: There is a great deal of overlap for the initial troubleshooting steps of Missing and Foreign Disks. For this reason, this TechNote can be used as a general reference for troubleshooting both.
Before continuing review the following:
1. Know the exact number of disks that should be present. This information will assist with determining the nature of the problem. This number should be compared to the number of disks that are detected by the Microsoft Windows Device Manager, as well the Veritas Enterprise Administrator.
2. Consider any recent hardware or software changes that have been made. It is very common to dismiss this step, only to later discover (after much troubleshooting and effort) that a recent, minor change was the cause of the problem and the resolution was to simply revert this change. Some common updates and changes that can be potentially problematic include:
- Zoning changes: Verify that the HBAs (host bus adapters) for the affected server(s) have been included in the same zone as the disks.
- HBAs: Use the HBA management software to verify that the HBA settings are correct. Compare the settings for an affected server with a server that is not exhibiting problems (if present).
- Storage area network (SAN) hardware changes, such as HBAs, switches or disk arrays
- Changes to Multipathing settings
- Driver updates
- Firmware or Microcode updates
- Other Hardware changes
- Other Software installations
3. Verify that the configuration matches the Symantec Hardware Compatibility List (HCL). Information about reviewing the HCL can be found in the following TechNote:
How to read the Hardware Compatibility List for Veritas Storage Foundation for Windows
The HCL for SFW can be found at:
This TechNote has been divided into five sections:
1. Initial Troubleshooting Steps
2. Disks are not detected by Veritas
3. Disks are detected as Foreign
4. Disks are detected as Basic
5. Problem has been determined to be outside of the scope of Storage Foundation for Windows
For most situations, Section 1 should be reviewed first, even when troubleshooting a problem that is also covered by the later sections. This TechNote has been designed to cover more common situations in the first section and to cover less common issues as the document progresses.
Section 1 - Initial Troubleshooting Steps
1. Perform a rescan.
- From VEA select Actions.
- Choose Rescan.
2. Check any installed multipathing solution is working correctly.
In many cases, Missing and Foreign Disks appear as the result of a problem with multipathing. If the Operating System (O/S), and thus Storage Foundation for Windows, are seeing disks multiple times as a result of multipathing showing disks multiple times this can lead to Missing and Foreign Disks to appear in Disk Groups. Some basic checks for multipathing operations are:
- Check the OS system event log for events from "partmgr" (Windows 2003 and later) when the system restarts. The appearance of these messages indicates the OS is seeing the same disk more than once and multipathing isn't handling the disk correctly.
- When performing a rescan, the representation of the disks changes. This is due to the disks private regions being scanned and being placed in the disk group in different combinations of the available duplicated disks.
- Count the disks in the Storage Foundation for Windows GUI and the Windows Device Manager, compare to the HBA tool view. It might be practical to count the LUNs from the HBA tool and manually collate the duplicate devices to compare with the count in the OS tools. The HBA tool will show the "raw" count of the disk and it might be possible to filter the duplicates to obtain an expected OS count.
- To rule the possibility of multipathing issues out, disable all but one path to the disks. It is recommended that this step be taken early in the troubleshooting process before more drastic troubleshooting steps are attempted.
i. Disable one of the paths. This can be done by disconnecting the fiber cable from one of the HBAs or by disabling one the HBAs from the Windows Device Manager.
ii. Perform a rescan
iii. Connect each path, one at a time, followed by a rescan. This will determine if the disks are being detected by the host down one path, but not the other paths.
- 3. Check for failed providers. The steps for checking for failed providers can be found in the following TechNote:
Checking and troubleshooting failed providers in Veritas Storage Foundation for Windows
Section 2 - Disks are Not Detected by Veritas
This section may also be applied to cases where Missing Disks are listed, but they do not appear to correlate with any visible Foreign or Basic disks listed within VEA.
1. Verify that the disks are detected by the OS. This can be done from the Windows Device Manager or by running Diskpart from a Windows command prompt (Windows 2003 or later).
2. Verify that the disks are detected by the HBA Management Software. Examples of HBA management software include:
- Emulex HBAnywhere
- QLogic Sansurfer
- IBM Navisphere
a. If the disks are not detected by either Windows or the HBA Management software, the the issue more than likely the issue lies outside the scope of Storage Foundation for Windows (SFW).
Both the HBA drivers and the Windows disk drivers reside below SFW in the stack. Because of this, if the disks are not detected by either the Windows disk drivers or the HBA drivers,
SFW will not detect the disks either.
Note: In some cases, the HBA management software may detect the disks even though neither Windows, nor SFW can detect the disks.
This is possible because the HBA drivers reside at a lower layer than either the Windows disk driver or SFW.
In this case, the HBA settings, firmware and drivers should be examined to determine why the disks are not being presented to the Windows disk drivers.
Review Section 5, for recommendations for situations where the problem has been determined to be outside the scope of SFW.
b. If both Windows and the HBAs detect the disks, but SFW cannot, review the following:
i. Check for failed providers (if this has not already been done). Further information on this can be found in the following TechNote:
Checking and troubleshooting failed providers in Veritas Storage Foundation for Windows
ii. If this is an EMC disk array, review the following TechNote:
EMC DMX microcode 5x71 and its effect on Veritas Storage Foundation (tm) by Symantec 4.2 and 4.3 for Windows
iii. Verify that the latest maintenance packs or roll-up patches have been installed for SFW. Further information can be found in the following TechNote:
How to identify which version of VERITAS Volume Manager (tm) for Windows 2000 or VERITAS Storage Foundation (tm) for Windows has been installed, using the product build number
iv. Uninstall and reinstall SFW.
Section 3 - Disks are detected as Foreign
Note: This section may also be applied to cases where missing disks are listed, but the missing disks appear to correspond with disks that have been marked as foreign or basic.
1. Attempt to reactivate the disk
Sometimes a foreign status is caused due to a STALE or BAD-STATE status that has been set in the private region. Reactivating the disk can sometimes clear this flag and restore the disk to a healthy status.
a. Right-click on the disk.
b. Select Reactivate Disk.
2. Attempt a Merge Foreign disk operation.
This operation changes the disksetid of the foreign disk to match the disksetid of the other disks in the disk group.
a. Right-click on the disk.
b. Select Merge Foreign Disk.
Note: If it is necessary to merge multiple foreign disks, the vxdisk command line interface with merge option can be used in a for loop using following syntax:
vxdisk -g<diskgroup> merge harddisk<n>
3. Determine if there are any other disks in the disk group that are still healthy (have a valid copy of the private region). If healthy disks still exist in the disk group, a replace disk operation may be attempted. A replace disk operation points a "missing" disk record to a disk that is actually present. This is useful in cases where a disk has been marked as "missing" even though it is clearly present.
Note: If multiple disks are missing, attempting a replace disk operation is not recommended unless there is a clear understanding of which disk should be associated with which missing disk record. This operation is only possible if the disk(s) that are missing are in the basic group. It is a rare case for a disk signature to be reset other than manually, and foreign disks for example may require their signature be manually reset for this operation to succeed. In these scenarios due to the potential for data corruption it is recommended to contact Symantec Technical Support for assistance in confirming the replace disk operation is using the correct disks.
To perform a replace disk operation, perform the following steps:
a. Right-click on the missing disk.
b. Select Replace Disk.
c. Choose the disk that correlates with the missing disk record.
Note: If the replace disk operation returns "no new disks exist in the system," the affected disk will need to be modified to allow the replace disk operation to succeed. Contact Symantec Enterprise Technical Support for further information.
4. Restore the private region with VxCBR.
How to back up and restore the private region of a dynamic disk group from a command line with VxCBR in Veritas Storage Foundation for Windows
Note: If the VxCBR restore operation fails with the message "Invalid input! Diskgroup 'diskgroup' will be skipped," review the following TechNote:
The messages "Invalid input! Diskgroup 'diskgroup' will be skipped" and "You must specify a valid target diskgroup(s)" appear when attempting a restore using VxCBR in Veritas Storage Foundation for Windows
5.. Verify that the disk group version is at the same level or lower than the installed version of SFW. For example, SFW 4.2 is unable to read a disk group that is at version "43" or higher.
To determine the version of the disk group, perform the following steps:
a. Right-click on the disk group.
b. Select Properties. The disk group version will be listed under "Version."
6. If none of the above steps have resolved the "foreign" status, it is possible that there is a problem with the private region of the disk(s). Contact Symantec Enterprise Technical Support for further information.
Section 4 - Disks are detected as Basic
This issue may occur under the following conditions:
1. The size of the disk on which the volume resides has been expanded beyond 2 terabytes (2 TB). If the size of an MBR disk is expanded beyond 2 TB, the private region of the disk will no longer be accessible to Windows. As a result, all information about the volumes will be lost.
Disks that are larger than 2 terabytes revert to basic disks after a rescan occurs in Veritas Storage Foundation for Windows
2. The partition table of the disk has been lost. In some rare cases, the partition table may be lost while the private region remains intact. In this case, the disk will still appear as "basic" with no volumes. If this occurs, it may be possible to manually recreate the entries for the dynamic volumes in the partition table to regain access to the volumes. Contact Symantec Enterprise Technical Services for further information.
Note: Both of these issues are specific to MBR (Master Boot Record) disks.
Section 5 - Problem has been determined to be outside of the scope of InfoScale or Storage Foundation for Windows
The following are recommendations for situations where the problem has been determined to be outside the scope of SFW:
1. Shut down the other nodes and (if possible) reboot this node.
2. Reset the SCSI bus. This can be performed from VEA.
Warning: A reset SCSI bus operation will break the SCSI reservations for all devices on the bus. Do not perform this step without a full understanding of which devices are connected to the same SCSI bus and how they will be affected by this operation.
3. Uninstall any multipathing software.
Warning: Before uninstalling multipathing software, ensure that there is only one path to the disks.
4. Check the disk array to verify that there is not a formatting problem with the actual LUNs (Logical Unit Numbers).
5. Review the HBA settings using the HBA management software.
6. Review zoning. In particular, ensure that the HBAs for the affected servers are included in the same zones as the disks
7. Review LUN Masking. Ensure that the LUNs are being presented to the HBAs of the affected servers.