For Infoscale 8.0.2 and 9.0 for Windows, when using Volume Replicator (VVR) the heartbeat is failing to switch from hostname or admin IP to virtual IP causing replication to be stuck in Activating state
Problem
For Infoscale 8.0.2 and 9.0 for Windows, when using Volume Replicator (VVR) the heartbeat fails to switch from hostname or admin IP to virtual IP causing replication to be stuck in Activating state.
Error Message
There are no errors seen in the GUI or logs to account for this issue. Only the replication state does not transition from Activating to Active state.
Cause
When configuring VVR secondary in an Infoscale Cluster using a virtual IP address for local and remote Rlinks, if the connection is made using the secondary's hostname, VVR has problems converting the heartbeat for replication from this hostname to the virtual IP address selected. This is seen more often when replication is not started right away during the Add Secondary wizard.
Solution
Ensure that VVR requirements are met and that all required ports are open prior to attempting the workaround below. If assistance is needed to validate the VVR requirements and ports, please open a case with Arctera Support.
Ensure the below patch is installed on all cluster nodes at both sites:
For Infoscale 8.0.2: Patch_8_0_20009_0_4190108
For Infoscale 9.0: Patch_9_0_00001_0_4190913
Once the patch is installed, validate the configuration from vxprint -VPl and accordingly perform the suited steps.
SCENARIO #1: If there is a single IP listed in the vxprint -VPl output for the assoc flag and that is the correct VVR IP
For example: In the below vxprint -VPl, the VVR IP is listed as a single entry
assoc : rvg=TEST_RVG
<====DR VVR IP
remote_host=192.168.10.216 remote_dg=EVDG
<====PR VVR IP
remote_rlink=rlk_xxx_xxxx
local_host=192.168.10.116
The same IP is listed when viewed from VEA->Replication Network
SOLUTION: Once the patch is applied, simply perform Stop replication followed by Start replication.
SCENARIO #2: If there are dual entries for IP (wih one enclosed in brackets) listed in the vxprint -VPl output for the assoc flag.
For example: In the below vxprint -VPl, dual IP/incorrect IP are seen in the vxprint -VPl for the assoc flag
assoc : rvg=TEST_RVG
<====DR VVR IP
remote_host=192.168.10.216 remote_dg=EVDG
<==== Additional IP reported
remote_rlink=rlk_xxx_xxxx
local_host=192.168.10.116 {10.10.30.1}
While the view in VEA->Replication Network shows below:
SOLUTION: Once the patch is applied, perform the steps below:
o Run hastop -local -force on the secondary node
o Delete only the secondary
o Re-add the secondary.