Replication stops when the replicator log overflows in Veritas Storage Foundation for Windows - Volume Replicator Option

Article: 100029354
Last Published: 2021-12-28
Ratings: 1 0
Product(s): InfoScale & Storage Foundation

Problem

Replication stops when the replicator log overflows in Veritas Storage Foundation for Windows - Volume Replicator Option

 

Cause

.If the replicator log (SRL) overflows, the RDS (Replicated Data Set) will begin to track writes using a DCM (Data Change Map). When the DCM is in use, replication stops and all new writes are simply tracked in the DCM on the primary.

 

Solution
 

To determine if a DCM is being used and how to restart replication, follow the steps that are found in the Solution section of this document (below).

 
To understand why replication stops when the DCM is in use, the differences between the SRL and the DCM must be clarified.

The SRL is a large transaction log that tracks each individual write. Because of this, the order of the writes is maintained (Write-order fidelity). This makes it possible for replication to continue, even if the secondary is not up-to-date. Without write-order fidelity, the blocks on the secondary would become inconsistent.

A DCM is too small to track individual writes. Instead, it divides the volume into a number of regions. When a write occurs within a region, the entire region is marked as "dirty." Even if the region is very large, and the amount of data that is written is very small, the entire region is marked as dirty. The DCM does not maintain write-order fidelity. Because of this, the entire dirty regions must finish replicating to the secondary site before the secondary volume is considered "consistent." Until the all the dirty regions have been replicated, the secondary volume is considered "inconsistent" and is not usable.

In the event of a VVR (Veritas Volume Replicator) migration or take-over, the secondary volumes are promoted to primary volumes. If these operations are performed at a time when the secondary volumes are inconsistent, the volumes will be corrupt and will not be usable. To avoid this, replication automatically stops when an SRL overflows and the DCM is in use. The result is that the secondary volume is not up-to-date, but the data is "consistent" and usable.

Replication can be restarted manually by using the "Resynchronize Secondaries" command. Until all "dirty" regions have been replicated to the secondary, the volume will be inconsistent. During this time, a VVR migration or take-over will not be possible.

Note: If a "dirty" region finishes replicating to the secondary, but a new transaction is subsequently written to that same region, the entire region will again be marked as "dirty." This may result in situations where a replicated data set is unable to get out of DCM mode. If this occurs, two options may be used to resynchronize the secondary with the primary:

1. Temporarily stop all writes to the primary, allowing the DCM time to "drain."
2. Perform a block-level backup and restore of the replicated volume from the primary to the secondary sites. Further information on this can be found in the following article:   https://www.veritas.com/support/en_US/article.000088685

If an RVG is in DCM mode, replication will remain stopped until the following steps are performed:

1. Expand Replication Network.
2. Right-click on the primary RVG.
3. Select Resynchronize Secondaries.

If needed, use vxprint to determine if the RVGs are in DCM mode. This can be done by following the steps below.

1. Run the following command on both sites:

vxprint -VPl

Note: That is an upper-case V, an upper-case P and a lower-case L.

This will return results that are similar to the following (Figure 1):


Figure 1:  Sample Vxprint Output

Diskgroup = Test_DG

Rvg : Test_RVG
state : state=ACTIVE kernel=ENABLED
assoc : datavols=F:
 srl=\Device\HarddiskDmVolumes\Test_DG\RepLog
 rlinks=rlk_24432
att : rlinks=rlk_24432
checkpoint :
flags : primary enabled attached dcm_logging clustered
Rlink : rlk_24432
info : timeout=500 packet_size=8400
 latency_high_mark=10000 latency_low_mark=9950
 bandwidth_limit=none
state : state=ACTIVE
 synchronous=off latencyprot=off srlprot=autodcm
assoc : rvg=Test_RVG
 remote_host=x.x.x.x
 remote_dg=Test_DG
 remote_rlink=rlk__14382
 local_host=x.x.x.x
protocol : UDP/IP
flags : write attached consistent connected dcm_logging


2. Locate the flags attribute.

Note: There will be two instances of flags.

3. If the RVG or Rlink is in DCM mode, there will be a dcm_logging value next to the flags attributes.
 

 
 

 

Was this content helpful?