Veritas InfoScale™ 8.0 Troubleshooting Guide - AIX
- Introduction
- Section I. Troubleshooting Veritas File System
- Section II. Troubleshooting Veritas Volume Manager- Recovering from hardware failure- About recovery from hardware failure
- Listing unstartable volumes
- Displaying volume and plex states
- The plex state cycle
- Recovering an unstartable mirrored volume
- Recovering an unstartable volume with a disabled plex in the RECOVER state
- Forcibly restarting a disabled volume
- Clearing the failing flag on a disk
- Reattaching failed disks
- Recovering from a failed plex attach or synchronization operation
- Failures on RAID-5 volumes
- Recovering from an incomplete disk group move
- Restarting volumes after recovery when some nodes in the cluster become unavailable
- Recovery from failure of a DCO volume
 
- Recovering from instant snapshot failure- Recovering from the failure of vxsnap prepare
- Recovering from the failure of vxsnap make for full-sized instant snapshots
- Recovering from the failure of vxsnap make for break-off instant snapshots
- Recovering from the failure of vxsnap make for space-optimized instant snapshots
- Recovering from the failure of vxsnap restore
- Recovering from the failure of vxsnap refresh
- Recovering from copy-on-write failure
- Recovering from I/O errors during resynchronization
- Recovering from I/O failure on a DCO volume
- Recovering from failure of vxsnap upgrade of instant snap data change objects (DCOs)
 
- Recovering from failed vxresize operation
- Recovering from boot disk failure
- Managing commands, tasks, and transactions
- Backing up and restoring disk group configurations
- Troubleshooting issues with importing disk groups
- Recovering from CDS errors
- Logging and error messages
- Troubleshooting Veritas Volume Replicator- Recovery from RLINK connect problems
- Recovery from configuration errors- Errors during an RLINK attach
- Errors during modification of an RVG
 
- Recovery on the Primary or Secondary- About recovery from a Primary-host crash
- Recovering from Primary data volume error
- Primary SRL volume error cleanup and restart
- Primary SRL volume error at reboot
- Primary SRL volume overflow recovery
- Primary SRL header error cleanup and recovery
- Secondary data volume error cleanup and recovery
- Secondary SRL volume error cleanup and recovery
- Secondary SRL header error cleanup and recovery
- Secondary SRL header error at reboot
 
 
 
- Recovering from hardware failure
- Section III. Troubleshooting Dynamic Multi-Pathing
- Section IV. Troubleshooting Storage Foundation Cluster File System High Availability- Troubleshooting Storage Foundation Cluster File System High Availability- About troubleshooting Storage Foundation Cluster File System High Availability
- Troubleshooting CFS
- Troubleshooting fenced configurations
- Troubleshooting Cluster Volume Manager in Veritas InfoScale products clusters- CVM group is not online after adding a node to the Veritas InfoScale products cluster
- Shared disk group cannot be imported in Veritas InfoScale products cluster
- Unable to start CVM in Veritas InfoScale products cluster
- Removing preexisting keys
- CVMVolDg not online even though CVMCluster is online in Veritas InfoScale products cluster
 
 
 
- Troubleshooting Storage Foundation Cluster File System High Availability
- Section V. Troubleshooting Cluster Server- Troubleshooting and recovery for VCS- VCS message logging- Log unification of VCS agent's entry points
- Enhancing First Failure Data Capture (FFDC) to troubleshoot VCS resource's unexpected behavior
- GAB message logging
- Enabling debug logs for agents
- Enabling debug logs for IMF
- Enabling debug logs for the VCS engine
- About debug log tags usage
- Gathering VCS information for support analysis
- Gathering LLT and GAB information for support analysis
- Gathering IMF information for support analysis
- Message catalogs
 
- Troubleshooting the VCS engine
- Troubleshooting Low Latency Transport (LLT)
- Troubleshooting Group Membership Services/Atomic Broadcast (GAB)
- Troubleshooting VCS startup
- Troubleshooting Intelligent Monitoring Framework (IMF)
- Troubleshooting service groups- VCS does not automatically start service group
- System is not in RUNNING state
- Service group not configured to run on the system
- Service group not configured to autostart
- Service group is frozen
- Failover service group is online on another system
- A critical resource faulted
- Service group autodisabled
- Service group is waiting for the resource to be brought online/taken offline
- Service group is waiting for a dependency to be met.
- Service group not fully probed.
- Service group does not fail over to the forecasted system
- Service group does not fail over to the BiggestAvailable system even if FailOverPolicy is set to BiggestAvailable
- Restoring metering database from backup taken by VCS
- Initialization of metering database fails
 
- Troubleshooting resources
- Troubleshooting I/O fencing- Node is unable to join cluster while another node is being ejected
- The vxfentsthdw utility fails when SCSI TEST UNIT READY command fails
- Manually removing existing keys from SCSI-3 disks
- System panics to prevent potential data corruption
- Cluster ID on the I/O fencing key of coordinator disk does not match the local cluster's ID
- Fencing startup reports preexisting split-brain
- Registered keys are lost on the coordinator disks
- Replacing defective disks when the cluster is offline
- The vxfenswap utility exits if rcp or scp commands are not functional
- Troubleshooting CP server
- Troubleshooting server-based fencing on the Veritas InfoScale products cluster nodes
- Issues during online migration of coordination points
 
- Troubleshooting notification
- Troubleshooting and recovery for global clusters
- Troubleshooting the steward process
- Troubleshooting licensing- Validating license keys
- Licensing error messages- [Licensing] Insufficient memory to perform operation
- [Licensing] No valid VCS license keys were found
- [Licensing] Unable to find a valid base VCS license key
- [Licensing] License key cannot be used on this OS platform
- [Licensing] VCS evaluation period has expired
- [Licensing] License key can not be used on this system
- [Licensing] Unable to initialize the licensing framework
- [Licensing] QuickStart is not supported in this release
- [Licensing] Your evaluation period for the feature has expired. This feature will not be enabled the next time VCS starts
 
 
- Verifying the metered or forecasted values for CPU, Mem, and Swap
 
- VCS message logging
 
- Troubleshooting and recovery for VCS
- Section VI. Troubleshooting SFDB
Cleaning up the system configuration
After reinstalling VxVM, you must clean up the system configuration.
To clean up the system configuration
- After recovering the VxVM configuration, you must determine which volumes need to be restored from backup. The volumes to be restored include those with all mirrors (all copies of the volume) residing on disks that have been reinstalled or removed. These volumes are invalid and must be removed, recreated, and restored from backup. If only some mirrors of a volume exist on reinitialized or removed disks, these mirrors must be removed. The mirrors can be re-added later.Establish which VM disks have been removed or reinstalled using the following command: # vxdisk list This command displays a list of system disk devices and the status of these devices. For example, for a reinstalled system with three disks and a reinstalled root disk, the output of the vxdisk list command is similar to this: DEVICE TYPE DISK GROUP STATUS hdisk1 simple - - online invalid hdisk2 simple mydg01 mydg online hdisk3 simple mydg02 mydg online hdisk4 simple mydg03 mydg online The display shows that the reinstalled root device, hdisk1, is not associated with a VM disk and is marked with a status of online invalid. The disks mydg01 , mydg02 and mydg03 were not involved in the reinstallation, and are recognized by VxVM and associated with their devices (hdisk2, hdisk3 and hdisk4). If other disks (with volumes or mirrors on them) had been removed or replaced during reinstallation, those disks would also have a disk device listed in an online invalid state and a VM disk listed as not associated with a device. 
- Once you know which disks have been removed or replaced, locate all the mirrors on failed disks using the following command:# vxprint -sF "%vname" -e'sd_disk = "disk"' where disk is the name of a disk with a failed status. Be sure to enclose the disk name in quotes in the command. Otherwise, the command returns an error message. The vxprint command returns a list of volumes that have mirrors on the failed disk. Repeat this command for every disk with a failed status. 
- Check the status of each volume and print volume information using the following command:# vxprint -th volume where volume is the name of the volume to be examined. The vxprint command displays the status of the volume, its plexes, and the portions of disks that make up those plexes. For example, a volume named v01 with only one plex resides on the reinstalled disk named mydg01. The vxprint -th v01 command produces the following output: V NAME USETYPE KSTATE STATE LENGTH READPOL PREFPLEX PL NAME VOLUME KSTATE STATE LENGTH LAYOUT NCOL/WID MODE SD NAME PLEX DISK DISKOFFS LENGTH [COL/]OFF DEVICE MODE v v01 fsgen DISABLED ACTIVE 24000 SELECT - pl v01-01 v01 DISABLED NODEVICE 24000 CONCAT - RW sd mydg01-06 v0101 mydg01 245759 24000 0 hdisk2 ENA The only plex of the volume is shown in the line beginning with pl. The STATE field for the plex named v01-01 is NODEVICE. The plex has space on a disk that has been replaced, removed, or reinstalled. The plex is no longer valid and must be removed. 
- Because v01-01 was the only plex of the volume, the volume contents are irrecoverable except by restoring the volume from a backup. The volume must also be removed. If a backup copy of the volume exists, you can restore the volume later. Keep a record of the volume name and its length, as you will need it for the backup procedure.Remove irrecoverable volumes (such as v01) using the following command: # vxedit -rf rm v01 Assess whether you need to use the force option (-f) with this command. 
- It is possible that only part of a plex is located on the failed disk. If the volume has a striped plex associated with it, the volume is divided between several disks. For example, the volume named v02 has one striped plex striped across three disks, one of which is the reinstalled disk mydg01. The vxprint -th v02 command produces the following output:V NAME USETYPE KSTATE STATE LENGTH READPOL PREFPLEX PL NAME VOLUME KSTATE STATE LENGTH LAYOUT NCOL/WID MODE SD NAME PLEX DISK DISKOFFS LENGTH [COL/]OFF DEVICE MODE v v02 fsgen DISABLED ACTIVE 30720 SELECT v02-01 pl v02-01 v02 DISABLED NODEVICE 30720 STRIPE 3/128 RW sd mydg02-02v02-01 mydg01 424144 10240 0/0 hdisk3 ENA sd mydg01-05v02-01 mydg01 620544 10240 1/0 hdisk2 DIS sd mydg03-01v02-01 mydg03 620544 10240 2/0 hdisk4 ENA The display shows three disks, across which the plex v02-01 is striped (the lines starting with sd represent the stripes). One of the stripe areas is located on a failed disk. This disk is no longer valid, so the plex named v02-01 has a state of NODEVICE. Since this is the only plex of the volume, the volume is invalid and must be removed. If a copy of v02 exists on the backup media, it can be restored later. Keep a record of the volume name and length of any volume you intend to restore from backup. Remove invalid volumes (such as v02) using the following command: # vxedit -rf rm v02 Assess whether you need to use the force option (-f) with this command. 
- A volume that has one mirror on a failed disk can also have other mirrors on disks that are still valid. In this case, the volume does not need to be restored from backup, since the data is still valid on the valid disks.The output of the vxprint -th command for a volume with one plex on a failed disk (mydg01) and another plex on a valid disk (mydg02) is similar to the following: V NAME USETYPE KSTATE STATE LENGTH READPOL PREFPLEX PL NAME VOLUME KSTATE STATE LENGTH LAYOUT NCOL/WID MODE SD NAME PLEX DISK DISKOFFS LENGTH [COL/]OFF DEVICE MODE v v03 fsgen DISABLED ACTIVE 0720 SELECT - pl v03-01 v03 DISABLED ACTIVE 30720 CONCAT - RW sd mydg02-01 v03-01 mydg01 620544 30720 0 hdisk3 ENA pl v03-02 v03 DISABLED NODEVICE 30720 CONCAT - RW sd mydg01-04 v03-02 mydg03 262144 30720 0 hdisk2 DIS This volume has two plexes, v03-01 and v03-02. The first plex (v03-01) does not use any space on the invalid disk, so it can still be used. The second plex (v03-02) uses space on invalid disk mydg01 and has a state of NODEVICE. Plex v03-02 must be removed. However, the volume still has one valid plex containing valid data. If the volume needs to be mirrored, another plex can be added later. Note the name of the volume to create another plex later. To remove an invalid plex, use the vxplex command to dissociate and then remove the plex from the volume. For example, to dissociate and remove the plex v03-02, use the following command: # vxplex -o rm dis v03-02 
- Once all invalid volumes and plexes have been removed, the disk configuration can be cleaned up. Each disk that was removed, reinstalled, or replaced (as determined from the output of the vxdisk list command) must be removed from the configuration.To remove the disk, use the vxdg command. To remove the failed disk mydg01, use the following command: # vxdg rmdisk mydg01 If the vxdg command returns an error message, invalid mirrors exist. Repeat step 1 through step 6 until all invalid volumes and mirrors are removed. 
- Any other disks that were replaced should now be added using the vxdiskadm command.
- Once all the disks have been added to the system, any volumes that were completely removed as part of the configuration cleanup can be recreated and their contents restored from backup. The volume recreation can be done by using the vxassist command or the graphical user interface.For example, to recreate the volumes v01 and v02, use the following command: # vxassist make v01 24000 # vxassist make v02 30720 layout=stripe nstripe=3 Once the volumes are created, they can be restored from backup using normal backup/restore procedures. 
- Recreate any plexes for volumes that had plexes removed as part of the volume cleanup. To replace the plex removed from volume v03, use the following command:# vxassist mirror v03 Once you have restored the volumes and plexes lost during reinstallation, recovery is complete and your system is configured as it was prior to the failure. 
- Start up hot-relocation, if required, by either rebooting the system or manually start the relocation watch daemon, vxrelocd (this also starts the vxnotify process).Warning: Hot-relocation should only be started when you are sure that it will not interfere with other reconfiguration procedures. To determine if hot-relocation has been started, use the following command to search for its entry in the process table: # ps -ef | grep vxrelocd See the Storage Foundation Administrator's Guide. See the vxrelocd(1M) manual page.