For Infoscale 8.0.2 and 9.0 for Windows, when using Volume Replicator (VVR) the heartbeat is failing to switch from hostname or admin IP to virtual IP causing replication to be stuck in Activating state
Problem
For Infoscale 8.0.2 and 9.0 for Windows, when using Volume Replicator (VVR) the heartbeat fails to switch from hostname or admin IP to virtual IP causing replication to be stuck in Activating state.
Error Message
There are no errors seen in the GUI or logs to account for this issue. Only the replication state does not transition from Activating to Active state.
Cause
When configuring VVR secondary in an Infoscale Cluster using a virtual IP address for local and remote Rlinks, if the connection is made using the secondary's hostname, VVR has problems converting the heartbeat for replication from this hostname to the virtual IP address selected. This is seen more often when replication is not started right away during the Add Secondary wizard.
Solution
Ensure that VVR requirements are met and that all required ports are open prior to attempting the workaround below. If assistance is needed to validate the VVR requirements and ports, please open a case with Arctera Support.
For Infoscale 8.0.2, ensure that the patch Patch_8_0_20009_0_4190108 is installed on both Primary and Secondary servers (all cluster nodes at both sites.)
Note: The above patch is applicable for Infoscale 8.0.2 only. A similar patch for Infoscale 9.0 is currently not available.
Once the patch is installed, validate the configuration from vxprint -VPl and accordingly perform the suited steps.
SCENARIO #1:
If there is a single IP listed in the vxprint -VPl output for the assoc flag and that is the correct VVR IP
For example: In the below vxprint -VPl, we can see the VVR IP listed as
assoc : rvg=TEST_RVG
<====DR VVR IP
remote_host=192.168.10.216 remote_dg=EVDG
<====PR VVR IP
remote_rlink=rlk_xxx_xxxx
local_host=192.168.10.116
The same is listed when viewed from VEA->Replication Network
SOLUTION: Once the patch is applied, simply perform Stop replication followed by Start replication
SCENARIO #2:
In the below vxprint -VPl, we can see dual IP/incorrect IP in the vxprint -VPl for the assoc flag
assoc : rvg=TEST_RVG
<====DR VVR IP
remote_host=192.168.10.216 remote_dg=EVDG
<==== Additional IP reported
remote_rlink=rlk_xxx_xxxx
local_host=192.168.10.116 {10.10.30.1}
While the view in VEA->Replication Network shows below:
SOLUTION: Once the patch is applied, perform the steps below.
o Run hastop -local -force on the secondary node
o Delete only the secondary
o Re-add the secondary.
The Product Engineering team currently plans to address this issue through a future patch or hotfix in 9.0 versions of the software. Please note that our company reserves the right to withdraw any fix from the targeted release if it fails quality assurance tests. Development plans are subject to change, and any actions you take based on this information, or your reliance on it, are at your own risk.