Symptom of Veritas Cluster Server needing to be restarted: error: VCS WARNING V-16-1-10367 Dump already in progress

Problem

The symptom is that no entries were logging to the engine log of 1 or more nodes. Dumping the configuration would error. Rebooted nodes would not re-join cluster and hastop -local -force would hang.  This required stopping had, unconfiguring gab and reforming the cluster.

Error Message

# haconf -dump
VCS WARNING V-16-1-10367 Dump already in progress

 

Rebooted node and it was seen in mode:
 

adelscott  SysState           CURRENT_DISCOVER_WAIT

(seen in hasys -state on another node of cluster and in engine log)

Cause

Unknown

Solution

1)  Use 'ps -aef' to find process IDs (pid's) of the had and hashadow processes; repeat steps 1 and 2 for all nodes in the cluster.

 

# ps -aef|grep ha
    root  4135     1   0 14:24:57 ?      0:00 /opt/VRTSvcs/bin/hashadow
    root  4019     1   0 14:24:55 ?      0:08 /opt/VRTSvcs/bin/had
    root  4283     1   0 14:25:05 ?      0:08 /opt/VRTSvcs/bin/Phantom/PhantomAgent -type Phantom
    root  5527  2459   0 14:26:10 ?      0:02 /opt/VRTSsfmh/bin/hareg -all -group -resource -clus -sys -rclus -rsys -rgroup -

2)  Kill both pid's on one command line to avoid them from restarting the other.

(this aborts the VCS engine but leaves production services running)

 

# kill 4135 4019

 

Use 'ps -aef|grep ha' to verify that both processes have been stopped.

 

3)  Determine if I/O fencing is running and unconfigure on all nodes of the cluster if it exists.

 

# gabconfig -a
GAB Port Memberships
===============================================================
Port a gen   286101 membership 01
Port b gen   286105 membership 01    <===
Port h gen   286104 membership 01

( "01" in the last column indicates where this service is running)

 

# vxfenconfig -U

 

Run 'gabconfig -a' to validate that port b has been dropped from the output.

 

4)  Unconfigure gab on all nodes of the cluster

 

# gabconfig -U

 

Run 'gabconfig -a' to validate that no ports are listed in the output.

 

5)  Restart gab on all nodes.

 

# gabconfig -c -n<# of nodes>

 

After all nodes have been seeded, validate that gab has started on all nodes.

 

# gabconfig -a
GAB Port Memberships
===============================================================
Port a gen   286101 membership 01

 

6)  Restart I/O fencing on all nodes if it was determined to be configured in step 3.

 

# vxfenconfig -c

 

After starting I/O fencing on all nodes, validate that it has started on all nodes.

 

# gabconfig -a
GAB Port Memberships
===============================================================
Port a gen   286101 membership 01
Port b gen   286109 membership 01

 

7)  Restart had (VCS engine) on all nodes

 

# hastart

 

After starting had on all nodes, validate that it has started on all nodes.

 

# gabconfig -a
GAB Port Memberships
===============================================================
Port a gen   286101 membership 01
Port b gen   286109 membership 01
Port h gen   286106 membership 01

 

After the cluster and service groups has started and been procesed, use 'hastas -sum' to view a summary of the cluster status.


Applies To

A failover cluster running Veritas Cluster Server (VCS) version 5.0MP1RP5 on Solaris 10 systems.

 

Similar symptoms of commands hanging and no logging taking place have been reported for other VCS versions and other supported Unix Operating Systems.

Terms of use for this information are found in Legal Notices.

Search

Survey

Did this article answer your question or resolve your issue?

No
Yes

Did this article save you the trouble of contacting technical support?

No
Yes

How can we make this article more helpful?

Email Address (Optional)