On a system running AIX operating system, monitor of VCS IP resource returns online even after offlining the resource and failing over to the second node.

Article: 100027735
Last Published: 2012-09-26
Ratings: 0 0
Product(s): InfoScale & Storage Foundation

Problem

On a system running AIX operating system (OS), monitor of Veritas Cluster Service (VCS) IP resource returns online even after offlining the resource and failing over to the second node. This causes concurrency violation on the service group.

Error Message

From engine_A.log:

==> IP resource is being brought offline on Node1:

2012/08/01 01:08:17 VCS ERROR V-16-2-13067 (Node1) Agent is calling clean for resource(rvg_ip_1) because the resource became OFFLINE unexpectedly, on its own.
2012/08/01 01:08:18 VCS WARNING V-16-10011-3304 (Node1) IP:rvg_ip_1:clean:The value of NetMask attribute and netmask configured for interface [en9] does not match.
2012/08/01 01:08:18 VCS INFO V-16-2-13068 (Node1) Resource(rvg_ip_1) - clean completed successfully.
2012/08/01 01:08:18 VCS INFO V-16-1-10307 Resource rvg_ip_1 (Owner: Unspecified, Group: RVG_LOGOWNER) is offline on Node1 (Not initiated by VCS)
2012/08/01 01:08:18 VCS NOTICE V-16-1-10300 Initiating Offline of Resource rvg_logowner_1 (Owner: Unspecified, Group: RVG_LOGOWNER) on System Node1
2012/08/01 01:08:19 VCS INFO V-16-6-15015 (Node1) hatrigger:/opt/VRTSvcs/bin/triggers/resfault is not a trigger scripts directory or can not be executed
2012/08/01 01:08:21 VCS INFO V-16-1-10305 Resource rvg_logowner_1 (Owner: Unspecified, Group: RVG_LOGOWNER) is offline on Node1 (VCS initiated)
2012/08/01 01:08:21 VCS ERROR V-16-1-10205 Group RVG_LOGOWNER is faulted on system Node1
2012/08/01 01:08:21 VCS NOTICE V-16-1-10446 Group RVG_LOGOWNER is offline on system Node1


==> The resource is then brought online by VCS on Node2:

2012/08/01 01:08:21 VCS NOTICE V-16-1-10301 Initiating Online of Resource rvg_ip_1 (Owner: Unspecified, Group: RVG_LOGOWNER) on System Node2
2012/08/01 01:08:21 VCS INFO V-16-6-15002 (Node1) hatrigger:hatrigger executed /opt/VRTSvcs/bin/triggers/postoffline Node1 RVG_LOGOWNER   successfully
2012/08/01 01:08:26 VCS INFO V-16-10011-0 (Node2) IP:rvg_ip_1:online:tcpdump is running with pid [66912374].
2012/08/01 01:08:27 VCS INFO V-16-1-10298 Resource rvg_ip_1 (Owner: Unspecified, Group: RVG_LOGOWNER) is online on Node2 (VCS initiated)


==> The resource is then detected online by VCS on Node1 causing concurrency violation of the service group:

2012/08/01 01:18:19 VCS INFO V-16-1-10299 Resource rvg_ip_1 (Owner: Unspecified, Group: RVG_LOGOWNER) is online on Node1 (Not initiated by VCS)
2012/08/01 01:18:19 VCS ERROR V-16-1-10214 Concurrency Violation:CurrentCount increased above 1 for failover group RVG_LOGOWNER


==> Later on, when the resource is manually brought offline on Node1, an error is noticed from ifconfig command:

2012/08/01 01:53:41 VCS NOTICE V-16-1-10167 Initiating manual offline of group RVG_LOGOWNER on system Node1
2012/08/01 01:53:41 VCS NOTICE V-16-1-10300 Initiating Offline of Resource rvg_ip_1 (Owner: Unspecified, Group: RVG_LOGOWNER) on System Node1
2012/08/01 01:53:41 VCS INFO V-16-6-15002 (Node1) hatrigger:hatrigger executed /opt/VRTSvcs/bin/internal_triggers/violation Node1 RVG_LOGOWNER  successfully
2012/08/01 01:53:41 VCS WARNING V-16-10011-3304 (Node1) IP:rvg_ip_1:offline:The value of NetMask attribute and netmask configured for interface [en9] does not match.
2012/08/01 01:53:43 VCS INFO V-16-2-13716 (Node1) Resource(rvg_ip_1): Output of the completed operation (offline)
==============================================
ifconfig: ifconfig: ioctl (SIOCDIFADDR): Can't assign requested address
==============================================

2012/08/01 01:53:43 VCS ERROR V-16-2-13064 (Node1) Agent is calling clean for resource(rvg_ip_1) because the resource is up even after offline completed.
2012/08/01 01:53:44 VCS INFO V-16-2-13068 (Node1) Resource(rvg_ip_1) - clean completed successfully.
2012/08/01 01:53:44 VCS INFO V-16-1-10305 Resource rvg_ip_1 (Owner: Unspecified, Group: RVG_LOGOWNER) is offline on Node1 (VCS initiated)

 

Cause

VCS IP agent does a ping test to the assigned IP address to verify if the resource is online or not. Further debugging confirmed that the ping returns successfully even when the IP address is unplumbed. The reply to the ping for an IP was coming from another IP address. The problem was reproducible even after cleaning out the ARP as suggested by IBM. This issue was reproduced even without VCS in picture and is related to AIX operating system (OS).

Solution

The problem is related to AIX operating system where the reply to the ping for an IP returns successful from another IP address. Customer need to contact IBM to resolve this issue. From VCS end, it is recommended to increase the OnlineRetryLimit to 1 for the IP agent (if it is set to the default value of 0).

# hatype -modify IP OnlineRetryLimit 1

 

 

Applies To

This issue is only applicable to systems running VCS on AIX operation system.

References

Etrack : 2869021

Was this content helpful?