Please enter search query.
Search <product_name> all support & community content...
Replication stops when the replicator log overflows in Veritas Storage Foundation for Windows - Volume Replicator Option
Article: 100029354
Last Published: 2021-12-28
Ratings: 1 0
Product(s): InfoScale & Storage Foundation
Problem
Replication stops when the replicator log overflows in Veritas Storage Foundation for Windows - Volume Replicator OptionCause
.If the replicator log (SRL) overflows, the RDS (Replicated Data Set) will begin to track writes using a DCM (Data Change Map). When the DCM is in use, replication stops and all new writes are simply tracked in the DCM on the primary.
Solution
To determine if a DCM is being used and how to restart replication, follow the steps that are found in the Solution section of this document (below).
To understand why replication stops when the DCM is in use, the differences between the SRL and the DCM must be clarified.
The SRL is a large transaction log that tracks each individual write. Because of this, the order of the writes is maintained (Write-order fidelity). This makes it possible for replication to continue, even if the secondary is not up-to-date. Without write-order fidelity, the blocks on the secondary would become inconsistent.
A DCM is too small to track individual writes. Instead, it divides the volume into a number of regions. When a write occurs within a region, the entire region is marked as "dirty." Even if the region is very large, and the amount of data that is written is very small, the entire region is marked as dirty. The DCM does not maintain write-order fidelity. Because of this, the entire dirty regions must finish replicating to the secondary site before the secondary volume is considered "consistent." Until the all the dirty regions have been replicated, the secondary volume is considered "inconsistent" and is not usable.
In the event of a VVR (Veritas Volume Replicator) migration or take-over, the secondary volumes are promoted to primary volumes. If these operations are performed at a time when the secondary volumes are inconsistent, the volumes will be corrupt and will not be usable. To avoid this, replication automatically stops when an SRL overflows and the DCM is in use. The result is that the secondary volume is not up-to-date, but the data is "consistent" and usable.
Replication can be restarted manually by using the "Resynchronize Secondaries" command. Until all "dirty" regions have been replicated to the secondary, the volume will be inconsistent. During this time, a VVR migration or take-over will not be possible.
Note: If a "dirty" region finishes replicating to the secondary, but a new transaction is subsequently written to that same region, the entire region will again be marked as "dirty." This may result in situations where a replicated data set is unable to get out of DCM mode. If this occurs, two options may be used to resynchronize the secondary with the primary:
1. Temporarily stop all writes to the primary, allowing the DCM time to "drain."
2. Perform a block-level backup and restore of the replicated volume from the primary to the secondary sites. Further information on this can be found in the following article: https://www.veritas.com/support/en_US/article.000088685
The SRL is a large transaction log that tracks each individual write. Because of this, the order of the writes is maintained (Write-order fidelity). This makes it possible for replication to continue, even if the secondary is not up-to-date. Without write-order fidelity, the blocks on the secondary would become inconsistent.
A DCM is too small to track individual writes. Instead, it divides the volume into a number of regions. When a write occurs within a region, the entire region is marked as "dirty." Even if the region is very large, and the amount of data that is written is very small, the entire region is marked as dirty. The DCM does not maintain write-order fidelity. Because of this, the entire dirty regions must finish replicating to the secondary site before the secondary volume is considered "consistent." Until the all the dirty regions have been replicated, the secondary volume is considered "inconsistent" and is not usable.
In the event of a VVR (Veritas Volume Replicator) migration or take-over, the secondary volumes are promoted to primary volumes. If these operations are performed at a time when the secondary volumes are inconsistent, the volumes will be corrupt and will not be usable. To avoid this, replication automatically stops when an SRL overflows and the DCM is in use. The result is that the secondary volume is not up-to-date, but the data is "consistent" and usable.
Replication can be restarted manually by using the "Resynchronize Secondaries" command. Until all "dirty" regions have been replicated to the secondary, the volume will be inconsistent. During this time, a VVR migration or take-over will not be possible.
Note: If a "dirty" region finishes replicating to the secondary, but a new transaction is subsequently written to that same region, the entire region will again be marked as "dirty." This may result in situations where a replicated data set is unable to get out of DCM mode. If this occurs, two options may be used to resynchronize the secondary with the primary:
1. Temporarily stop all writes to the primary, allowing the DCM time to "drain."
2. Perform a block-level backup and restore of the replicated volume from the primary to the secondary sites. Further information on this can be found in the following article: https://www.veritas.com/support/en_US/article.000088685
If an RVG is in DCM mode, replication will remain stopped until the following steps are performed:
1. Expand Replication Network.
2. Right-click on the primary RVG.
3. Select Resynchronize Secondaries.
If needed, use vxprint to determine if the RVGs are in DCM mode. This can be done by following the steps below.
1. Run the following command on both sites:
vxprint -VPl
Note: That is an upper-case V, an upper-case P and a lower-case L.
This will return results that are similar to the following (Figure 1):
Figure 1: Sample Vxprint Output
Diskgroup = Test_DG
Rvg : Test_RVG
state : state=ACTIVE kernel=ENABLED
assoc : datavols=F:
srl=\Device\HarddiskDmVolumes\Test_DG\RepLog
rlinks=rlk_24432
att : rlinks=rlk_24432
checkpoint :
flags : primary enabled attached dcm_logging clustered
Rlink : rlk_24432
info : timeout=500 packet_size=8400
latency_high_mark=10000 latency_low_mark=9950
bandwidth_limit=none
state : state=ACTIVE
synchronous=off latencyprot=off srlprot=autodcm
assoc : rvg=Test_RVG
remote_host=x.x.x.x
remote_dg=Test_DG
remote_rlink=rlk__14382
local_host=x.x.x.x
protocol : UDP/IP
flags : write attached consistent connected dcm_logging
2. Locate the flags attribute.
Note: There will be two instances of flags.
3. If the RVG or Rlink is in DCM mode, there will be a dcm_logging value next to the flags attributes.