"Agent is calling clean for resource(cvm_clus) because the resource is not up even after online completed" appears while starting CVM

Article: 100017092
Last Published: 2023-10-27
Ratings: 1 0
Product(s): InfoScale & Storage Foundation

Problem

CVM (Cluster Volume Manager) fails to start due to incorrect node IDs defined in CVMNodeId.

 

Solution

Inconsistent node IDs defined in VCS/CVM (Veritas Cluster Server/Cluster Volume Manager) configuration causes CVM to fail to start.


Scenario 1

The /etc/llthosts files on each VCS node are the same, but the node IDs that are set to the CVMNodeId attribute are different from the node IDs defined in /etc/llthosts. The node IDs defined in the CVMCluster agent (default resource name: cvm_clus) attribute CVMNodeId must be the same as the node IDs that are defined in the /etc/llthosts file. CVM will not start if the attribute CVMNodeId is not set correctly, according to the /etc/llthosts file.

This configuration problem will not be reported by the hacf -verify command, nor will it be recorded in the engine_A.log or the CVMCluster_A.log.  

The vxfenadm -g all -f /etc/vxfentab output will show the registration of each node on the fencing disk as normal. The vxdisk command will show that each node can see all the shared storage devices and DMP (Dynamic Multi-Pathing) is normal. However, VCS will not be able to bring the CVM resource online. An attempt to start the CVM by running the vxclustadm startnode command will not work.

# /opt/VRTS/bin/vxclustadm -m vcs -t gab startnode

VxVM vxclustadm INFO V-5-2-0 initialization completed

In addition, gabconfig -a will show GAB port membership as below:

# gabconfig -a

GAB Port Memberships

===============================================================

Port a gen 22e7d04 membership ; 23

Port b gen 22e7d72 membership ; 23

Port u gen 22e7d74 membership ; 3

Port u gen 22e7d74 visible ; 2

Port v gen 22e7d76 membership ; 3

Port v gen 22e7d76 visible ; 2

The VCS engine_A.log will typically contain the logs below when the cluster tries to bring the CVM resource online.

2005/07/13 14:57:32 VCS INFO V-16-2-13068 (vcslab-16) Resource(cvm_clus) - clean completed successfully.

2005/07/13 14:57:32 VCS INFO V-16-2-13072 (vcslab-16) Resource(cvm_clus): Agent is retrying online (attempt number 3 of 3).

2005/07/13 14:57:42 VCS WARNING V-16-10001-1002 (vcslab-15) CVMCluster:cvm_clus:online:CVMCluster start failed on this node.

2005/07/13 14:57:42 VCS INFO V-16-2-13001 (vcslab-15) Resource(cvm_clus): Output of the completed operation (online)ERROR:

2005/07/13 14:59:43 VCS ERROR V-16-2-13066 (vcslab-15) Agent is calling clean for resource(cvm_clus) because the resource is not up even after online completed.

2005/07/13 14:59:43 VCS INFO V-16-2-13068 (vcslab-15) Resource(cvm_clus) - clean completed successfully.

2005/07/13 14:59:43 VCS INFO V-16-2-13072 (vcslab-15) Resource(cvm_clus): Agent is retrying online (attempt number 3 of 3).

2005/07/13 14:59:53 VCS WARNING V-16-10001-1002 (vcslab-16) CVMCluster:cvm_clus:online:CVMCluster start failed on this node.

2005/07/13 14:59:54 VCS INFO V-16-2-13001 (vcslab-16) Resource(cvm_clus): Output of the completed operation (online)ERROR:

2005/07/13 15:01:54 VCS ERROR V-16-2-13066 (vcslab-16) Agent is calling clean for resource(cvm_clus) because the resource is not up even after online completed.

2005/07/13 15:01:55 VCS INFO V-16-2-13068 (vcslab-16) Resource(cvm_clus) - clean completed successfully.

2005/07/13 15:01:55 VCS INFO V-16-2-13071 (vcslab-16) Resource(cvm_clus): reached OnlineRetryLimit(3).

2005/07/13 15:01:56 VCS ERROR V-16-1-10303 Resource cvm_clus (Owner: unknown, Group: cvm) is FAULTED (timed out) on sys vcslab-16

2005/07/13 15:01:56 VCS ERROR V-16-1-10205 Group cvm is faulted on system vcslab-16

2005/07/13 15:01:56 VCS NOTICE V-16-1-10446 Group cvm is offline on system vcslab-16

The error logged above is not actually due to the Cluster Volume Manager start timing issue. Increasing the resource CVMTimeout will not resolve the problem.

To check whether CVM fails to start due to incorrect CVM NodeId settings, follow these steps:

1.   /sbin/lltstat -vvn  
2.   /opt/VRTSvcs/bin/hares -display cvm_resource_name | grep NodeId


Example:

# /sbin/lltstat -vvn | more

LLT node information:

Node State Link Status Address

0 CONNWAIT bge1 DOWN bge2 DOWN

1 CONNWAIT bge1 DOWN bge2 DOWN *

2 vcslab-15 OPEN bge1 UP 00:03:00:84:36:F2 bge2 UP 00:03:00:84:36:F3

3 vcslab-16 OPEN bge1 UP 00:03:00:55:85:1A bge2 UP 00:03:00:55:85:1B

4 CONNWAIT bge1 DOWN bge2 DOWN

... snip ...

# /opt/VRTSvcs/bin/hares -display cvm_clus | grep NodeId

cvm_clus CVMNodeId global vcslab-15 2 vcslab-16 3

The outputs above show that the node IDs are consistent.  If they are not, correct them by running the commands below:

1. haconf -makerw
2. hares -modify cvm_clus CVMNodeId <node_name_1> <ID_1> <node_name_2> <ID2>     


Example:

# hares -modify cvm_clus CVMNodeId vcslab-15 20 vcslab-16 30

# /opt/VRTSvcs/bin/hares -display cvm_clus | grep NodeId

cvm_clus CVMNodeId global vcslab-15 20 vcslab-16 30

Once the CVMNodeId problem is corrected, start Cluster Volume Manager. A cluster reboot may be required.

Scenario 2

Node IDs in /etc/llthosts on each host are inconsistent.

If the node IDs in the /etc/llthosts files on each node in the cluster are not the same, the node joining the cluster will be panicked and the following logs will be logged in the  /var/adm/messages file:

Jul 13 03:25:25 vcslab-15 llt: [ID 516570 kern.warning] Warning: LLT ERROR V-14-1-10030 duplicate cluster 234 node 3 detected , link 0 (bge1), address 00:03:BA:55:85:1A

Jul 13 03:25:25 vcslab-15 llt: [ID 464203 kern.notice] LLT WARNING V-14-1-10099 LLT is now disabled

Jul 13 03:25:25 vcslab-15 unix: [ID 836849 kern.notice]

Jul 13 03:25:25 vcslab-15 ^Mpanic[cpu0]/thread=2a1003ebd40:

Jul 13 03:25:26 vcslab-15 unix: [ID 286223 kern.notice] GAB: Port a halting system due to node id conflict

The node will persist in a reboot cycle. To fix the problem:

  1. Boot into single user mode.
  2. Check /etc/llthosts file.
  3. Correct the incorrect node IDs.
  4. Reboot.
 

References

Etrack : 392687 UMI : V-16-2-13066

Was this content helpful?