Infidelity of Write Order at the VVR DR/Secondary Site may result in a rare possibility of data inconsistency
Problem
Data on the VVR Secondary (Disaster Recovery) Site may get corrupted or exhibit data inconsistency due to re-ordering of I/Os caused by Secondary Logging.
Applies To:
Versions: All Veritas Volume Replicator (VVR) Versions: 6.0 and above. VVR versions 5.x and below are not impacted.
Feature: VVR secondary logging
OS: Cross-Platform (Linux, AIX, Solaris, HP-UX)
Workaround: Disable VVR secondary logging
Solution Vehicle: The VVR secondary logging feature is being redesigned and would be available with a future release.
What is Secondary Logging and Impact of disabling it?
Secondary logging is an advanced feature available with product versions 6.x and higher to improve replication performance/throughput. It utilizes the SRL at the Secondary (DR) site to stage data before writing on to the data volumes.
The network transfer performance boost (i.e. rate of data transferred from the Primary to the Secondary) obtained due to logging of the data using the SRL at the Secondary site(s) cannot be leveraged when Secondary Logging is disabled .
What is Bulk Transfer and Impact of disabling it?
With disk group version 190 and above, the Bulk Transfer feature is automatically enabled to effectively use network bandwidth for replication; data is replicated to a disaster recovery (DR) site in bulk at 256 KB. The bulk data transfer feature reduces Volume Replicator (VVR) CPU related overhead and increases the overall replication throughput.
The Bulk Transfer features requires Secondary Logging to be enabled.
Note: The replication performance would be as good as pre 6.X versions with Secondary logging and Bulk Transfer disabled .
Error Message
Sample error message recorded in the system log:
VxVM VVR vxio V-5-0-2001vol_rv_dvstart_start: not expected seqno [seq number + 128 or 256 or 384...] [seq number]
Cause
During the processing of secondary data, the following message may appear in system logs:
VxVM VVR vxio V-5-0-2001vol_rv_dvstart_start: not expected seqno [seq number + 128 or 256 or 384...] [seq number]
The occurrence of the above message does not necessarily indicate corruption. It indicates that an out of order write is being delivered for processing.
Only if any of the preceding writes in the given set of number of writes (as mentioned by the difference of sequence of writes in the error message) were to occur on the same block, then the given block would contain an old value (resulting in corruption) as the latest write was applied first (out of sequence).
Solution
Disable Secondary Logging and Bulk Transfer via the tunables outlined below. The tunable settings are persistent across reboots.
As a full synchronization of the data would be performed in the course of disabling secondary logging; this will also help address and correct the inconsistent data at the Secondary (DR) sites, if any (if the Primary<->DR roles were not swapped).
However, if Primary and DR Roles were swapped at any time after the occurrence of these messages (as described in Error Section), then the new Primary may have inconsistent data. Therefore, any corrective action (if possible) needs to be taken to validate and maintain the consistent data at new Primary before disabling secondary logging and performing a Full sync/Autosync (otherwise, it may result in both sites having incorrect data).
Procedure to Disable Secondary Logging for Versions 6.0.x:
Replication should be paused and resumed for the tunables to take effect. The Secondary’s data should be fully synchronized once (i.e autosync) after disabling secondary logging.
Set the related secondary logging and bulk transfer tunables to ‘0’.
Detailed steps:
1) Pause the replication on both Primary & Secondary Sites of the VVR pair:
vxrlink -g [dgname] pause [rlink_name]
Note: Repeat the above operation at both sites (Primary & DR) for all the RVGs/Rlinks configured in the system.
2) Execute following commands on all the servers at both Primary & Secondary Sites of the VVR/CVR pair (to disable the tunables):
# vxtune vol_rv_do_secondary_logging 0
# vxtune vol_rv_sec_logging_enabled 0
Note: If any of the VVR pairs are in CVM/CVR/FSS, then these tunables should be turned off on all the hosts of the cluster (i.e Master & All Slaves of Primary and DR Sites).
To check the current value of these tunables, use the following commands:
(a) Variable values before modification:
# vxtune vol_rv_do_secondary_logging
Tunable Current Value Default Value Reboot
--------------------------------- --------------- ------------- ------
vol_rv_do_secondary_logging 1 1 N
# vxtune vol_rv_sec_logging_enabled
Tunable Current Value Default Value Reboot
--------------------------------- --------------- ------------- ------
vol_rv_sec_logging_enabled 1 1 N
(b) Variable values after modification:
# vxtune vol_rv_do_secondary_logging
Tunable Current Value Default Value Reboot
--------------------------------- --------------- ------------- ------
vol_rv_do_secondary_logging 0 1 N
# vxtune vol_rv_sec_logging_enabled
Tunable Current Value Default Value Reboot
--------------------------------- --------------- ------------- ------
vol_rv_sec_logging_enabled 0 1 N
3) Now, resume the replication from both Primary & Secondary Sites of the VVR pair:
vxrlink -g [dgname] resume [rlink_name]
Note: Repeat the above operation across all sites (all Primary & Secondary servers) for all the RVGs/Rlinks configured in the RDS (Replicated Data Set) .
4) Now, all the RVGs need to be resynchronized again by starting and performing autosync on all the RVGs existing in the system where these tunables were turned off. Please refer to the Veritas SF/HA Solution Replication Administrator's Guide for details.
Note: These are system wide tunables, and therefore, an autosync needs to be performed for all the RVGs present on the setup.
Procedure to Disable Secondary Logging for Versions 6.1 and above:
For versions 6.1 and above, ‘bulk transfer’ too needs to be disabled along with Secondary Logging.
Replication should be paused and resumed for the tunables to take effect. The Secondary’s data should be fully synchronized once (i.e autosync) after disabling secondary logging/bulk transfer.
Set the related secondary logging + bulk transfer tunables to ‘0’.
Detailed steps:
1) Pause the replication on both Primary & Secondary Sites of the VVR pair:
vxrlink -g [dgname] pause [rlink_name]
Note: Repeat the above operation at both sites (Primary & DR) for all the RVGs/Rlinks configured in the system.
2) Execute following commands on all the servers at both Primary & Secondary Sites of the VVR/CVR pair (to disable the tunables):
# vxtune vol_rv_do_secondary_logging 0
# vxtune vol_rv_sec_logging_enabled 0
# vxtune vol_rv_bulk_transfer 0
Note: If any of the VVR pairs are in CVM/CVR/FSS, then these tunables should be turned off on all the hosts of the cluster (i.e Master & All Slaves of Primary and Secondary Sites).
To check the current value of these tunables, use the following commands:
(a) Variable values before modification:
# vxtune vol_rv_do_secondary_logging
Tunable Current Value Default Value Reboot Clusterwide
------------------------------- ------------- ------------- ------ -----------
vol_rv_do_secondary_logging 1 1 N N
# vxtune vol_rv_sec_logging_enabled
Tunable Current Value Default Value Reboot Clusterwide
------------------------------- ------------- ------------- ------ -----------
vol_rv_sec_logging_enabled 1 1 N N
# vxtune vol_rv_bulk_transfer
Tunable Current Value Default Value Reboot Clusterwide
------------------------------- ------------- ------------- ------ -----------
vol_rv_bulk_transfer 1 1 N N
(b) Variable values after modification:
# vxtune vol_rv_do_secondary_logging
Tunable Current Value Default Value Reboot Clusterwide
------------------------------- ------------- ------------- ------ -----------
vol_rv_do_secondary_logging 0 1 N N
# vxtune vol_rv_sec_logging_enabled
Tunable Current Value Default Value Reboot Clusterwide
------------------------------- ------------- ------------- ------ -----------
vol_rv_sec_logging_enabled 0 1 N N
# vxtune vol_rv_bulk_transfer
Tunable Current Value Default Value Reboot Clusterwide
------------------------------- ------------- ------------- ------ -----------
vol_rv_bulk_transfer 0 1 N N
3) Now, resume the replication from both the Primary & Secondary Sites of the VVR pair:
vxrlink -g [dgname] resume [rlink_name]
Note: Repeat the above operation across all sites (all Primary & Secondary servers) for all the RVGs/Rlinks configured in the RDS (Replicated Data Set) .
4) Now, all the RVGs need to be resynchronized again by starting and performing autosync on all the RVGs existing in the system where these tunables were turned off. Please refer to the Veritas SF/HA Solution Replication Administrator's Guide for details.
Note: These are system wide tunables, and therefore, an autosync needs to be performed for all the RVGs present on the setup.