Veritas InfoScale™ 8.0 Troubleshooting Guide - AIX
- Introduction
- Section I. Troubleshooting Veritas File System
- Section II. Troubleshooting Veritas Volume Manager- Recovering from hardware failure- About recovery from hardware failure
- Listing unstartable volumes
- Displaying volume and plex states
- The plex state cycle
- Recovering an unstartable mirrored volume
- Recovering an unstartable volume with a disabled plex in the RECOVER state
- Forcibly restarting a disabled volume
- Clearing the failing flag on a disk
- Reattaching failed disks
- Recovering from a failed plex attach or synchronization operation
- Failures on RAID-5 volumes
- Recovering from an incomplete disk group move
- Restarting volumes after recovery when some nodes in the cluster become unavailable
- Recovery from failure of a DCO volume
 
- Recovering from instant snapshot failure- Recovering from the failure of vxsnap prepare
- Recovering from the failure of vxsnap make for full-sized instant snapshots
- Recovering from the failure of vxsnap make for break-off instant snapshots
- Recovering from the failure of vxsnap make for space-optimized instant snapshots
- Recovering from the failure of vxsnap restore
- Recovering from the failure of vxsnap refresh
- Recovering from copy-on-write failure
- Recovering from I/O errors during resynchronization
- Recovering from I/O failure on a DCO volume
- Recovering from failure of vxsnap upgrade of instant snap data change objects (DCOs)
 
- Recovering from failed vxresize operation
- Recovering from boot disk failure
- Managing commands, tasks, and transactions
- Backing up and restoring disk group configurations
- Troubleshooting issues with importing disk groups
- Recovering from CDS errors
- Logging and error messages
- Troubleshooting Veritas Volume Replicator- Recovery from RLINK connect problems
- Recovery from configuration errors- Errors during an RLINK attach
- Errors during modification of an RVG
 
- Recovery on the Primary or Secondary- About recovery from a Primary-host crash
- Recovering from Primary data volume error
- Primary SRL volume error cleanup and restart
- Primary SRL volume error at reboot
- Primary SRL volume overflow recovery
- Primary SRL header error cleanup and recovery
- Secondary data volume error cleanup and recovery
- Secondary SRL volume error cleanup and recovery
- Secondary SRL header error cleanup and recovery
- Secondary SRL header error at reboot
 
 
 
- Recovering from hardware failure
- Section III. Troubleshooting Dynamic Multi-Pathing
- Section IV. Troubleshooting Storage Foundation Cluster File System High Availability- Troubleshooting Storage Foundation Cluster File System High Availability- About troubleshooting Storage Foundation Cluster File System High Availability
- Troubleshooting CFS
- Troubleshooting fenced configurations
- Troubleshooting Cluster Volume Manager in Veritas InfoScale products clusters- CVM group is not online after adding a node to the Veritas InfoScale products cluster
- Shared disk group cannot be imported in Veritas InfoScale products cluster
- Unable to start CVM in Veritas InfoScale products cluster
- Removing preexisting keys
- CVMVolDg not online even though CVMCluster is online in Veritas InfoScale products cluster
 
 
 
- Troubleshooting Storage Foundation Cluster File System High Availability
- Section V. Troubleshooting Cluster Server- Troubleshooting and recovery for VCS- VCS message logging- Log unification of VCS agent's entry points
- Enhancing First Failure Data Capture (FFDC) to troubleshoot VCS resource's unexpected behavior
- GAB message logging
- Enabling debug logs for agents
- Enabling debug logs for IMF
- Enabling debug logs for the VCS engine
- About debug log tags usage
- Gathering VCS information for support analysis
- Gathering LLT and GAB information for support analysis
- Gathering IMF information for support analysis
- Message catalogs
 
- Troubleshooting the VCS engine
- Troubleshooting Low Latency Transport (LLT)
- Troubleshooting Group Membership Services/Atomic Broadcast (GAB)
- Troubleshooting VCS startup
- Troubleshooting Intelligent Monitoring Framework (IMF)
- Troubleshooting service groups- VCS does not automatically start service group
- System is not in RUNNING state
- Service group not configured to run on the system
- Service group not configured to autostart
- Service group is frozen
- Failover service group is online on another system
- A critical resource faulted
- Service group autodisabled
- Service group is waiting for the resource to be brought online/taken offline
- Service group is waiting for a dependency to be met.
- Service group not fully probed.
- Service group does not fail over to the forecasted system
- Service group does not fail over to the BiggestAvailable system even if FailOverPolicy is set to BiggestAvailable
- Restoring metering database from backup taken by VCS
- Initialization of metering database fails
 
- Troubleshooting resources
- Troubleshooting I/O fencing- Node is unable to join cluster while another node is being ejected
- The vxfentsthdw utility fails when SCSI TEST UNIT READY command fails
- Manually removing existing keys from SCSI-3 disks
- System panics to prevent potential data corruption
- Cluster ID on the I/O fencing key of coordinator disk does not match the local cluster's ID
- Fencing startup reports preexisting split-brain
- Registered keys are lost on the coordinator disks
- Replacing defective disks when the cluster is offline
- The vxfenswap utility exits if rcp or scp commands are not functional
- Troubleshooting CP server
- Troubleshooting server-based fencing on the Veritas InfoScale products cluster nodes
- Issues during online migration of coordination points
 
- Troubleshooting notification
- Troubleshooting and recovery for global clusters
- Troubleshooting the steward process
- Troubleshooting licensing- Validating license keys
- Licensing error messages- [Licensing] Insufficient memory to perform operation
- [Licensing] No valid VCS license keys were found
- [Licensing] Unable to find a valid base VCS license key
- [Licensing] License key cannot be used on this OS platform
- [Licensing] VCS evaluation period has expired
- [Licensing] License key can not be used on this system
- [Licensing] Unable to initialize the licensing framework
- [Licensing] QuickStart is not supported in this release
- [Licensing] Your evaluation period for the feature has expired. This feature will not be enabled the next time VCS starts
 
 
- Verifying the metered or forecasted values for CPU, Mem, and Swap
 
- VCS message logging
 
- Troubleshooting and recovery for VCS
- Section VI. Troubleshooting SFDB
Replacing defective disks when the cluster is offline
If the disk in the coordinator disk group becomes defective or inoperable and you want to switch to a new diskgroup in a cluster that is offline, then perform the following procedure.
In a cluster that is online, you can replace the disks using the vxfenswap utility.
Review the following information to replace coordinator disk in the coordinator disk group, or to destroy a coordinator disk group.
Note the following about the procedure:
- When you add a disk, add the disk to the disk group vxfencoorddg and retest the group for support of SCSI-3 persistent reservations. 
- You can destroy the coordinator disk group such that no registration keys remain on the disks. The disks can then be used elsewhere. 
To replace a disk in the coordinator disk group when the cluster is offline
- Log in as superuser on one of the cluster nodes.
- If VCS is running, shut it down:# hastop -all Make sure that the port h is closed on all the nodes. Run the following command to verify that the port h is closed: # gabconfig -a 
-  Stop the VCSMM driver on each node:   # /etc/init.d/vcsmm.rc stop 
- Stop I/O fencing on each node:# /etc/init.d/vxfen.rc stop This removes any registration keys on the disks. 
- Import the coordinator disk group. The file /etc/vxfendg includes the name of the disk group (typically, vxfencoorddg) that contains the coordinator disks, so use the command:# vxdg -tfC import 'cat /etc/vxfendg' where: -t specifies that the disk group is imported only until the node restarts. -f specifies that the import is to be done forcibly, which is necessary if one or more disks is not accessible. -C specifies that any import locks are removed. 
- To remove disks from the disk group, use the VxVM disk administrator utility, vxdiskadm.You may also destroy the existing coordinator disk group. For example: - Verify whether the coordinator attribute is set to on. - # vxdg list vxfencoorddg | grep flags: | grep coordinator 
- Destroy the coordinator disk group. - # vxdg -o coordinator destroy vxfencoorddg 
 
- Add the new disk to the node and initialize it as a VxVM disk.Then, add the new disk to the vxfencoorddg disk group: - If you destroyed the disk group in step 6, then create the disk group again and add the new disk to it. 
- If the diskgroup already exists, then add the new disk to it. - # vxdg -g vxfencoorddg -o coordinator adddisk disk_name 
 
- Test the recreated disk group for SCSI-3 persistent reservations compliance.
- After replacing disks in a coordinator disk group, deport the disk group:# vxdg deport 'cat /etc/vxfendg' 
-   On each node, start the I/O fencing driver:# /etc/init.d/vxfen.rc start 
- On each node, start the VCSMM driver:# /etc/init.d/vcsmm.rc start 
- Verify that the I/O fencing module has started and is enabled.# gabconfig -a Make sure that port b membership exists in the output for all nodes in the cluster. Make sure that port b and port o memberships exist in the output for all nodes in the cluster. # vxfenadm -d Make sure that I/O fencing mode is not disabled in the output. 
-  If necessary, restart VCS on each node:# hastart