How to do capacity planning (sizing) for Resiliency Platform replication appliances (VMware vSphere environment)
Description
Veritas Resiliency Platform replication appliance participates in the replication data path and hence its sizing directly depends on the amount and the rate of data transfer it is expected to handle. Given that a single replication appliance caters to multiple VMs, the aggregate data transfer rate from those VMs needs to be factored in the sizing of the appliance. Alternately, if the size of the appliance is fixed, it will be able to support only as many VMs based on their aggregate data transfer.
If there is a need to protect more VMs, then either the capacity (CPU and memory) for existing replication appliances needs to be increased or they would be a need to deploy additional replication appliances in primary and recovery data centers.
Given that replication involves data transfer over network, overall network characteristics, including latency, bandwidth and quality (packet loss) of network components (LAN, WAN) play a crucial role in arriving at the aggregate data transfer and hence the number of VMs that can be supported in a given environment.
The number of supported VMs or aggregate data transfer from those VMs is directly proportional to the WAN bandwidth. For example, for a 5Gbps WAN link it will be able to support 5 times as much data transfer rate as a 1Gbps WAN link.
Specific sizing attributes for a replication appliance are described below along with the factors that they depend on:
- Number of virtual CPUs: Aggregate data transfer rate from protected VMs
- Memory: Aggregate data transfer rate from protected VMs
- Disk Size (used as staging area): Number of VMs and average data transfer rate from each VM
- Number of replication appliances: Aggregate data transfer rate from protected VMs, number of disks attached to each of the protected VMs and the number of disks that can be attached to a replication appliance VM.
NOTE:
The last parameter is technology specific; for VMware, this number is 128 for ESXi 6.7 onwards and 60 for prior ESXi versions. That means these many disks can be attached to a gateway appliance when it acts as the replication target for the protected virtual machines.
Following assumptions are made while arriving at sizing recommendations
Source and target data centers are based on VMware hypervisor technology.
Both the data centers have LAN at least twice as fast as the WAN.
Oversubscription ratio for Virtual CPUs is 2:1.
80% of the WAN bandwidth is usable or in other words, there is minimal packet loss, if at all, over the WAN network.
The performance of the staging area should match the storage performance of the replicated production workload. In general, a minimum of 6 GB per virtual machine is mandated by the gateway. Large virtual machines or high change rate workloads may warrant additional staging area per virtual machine to sustain a spike in change rate or momentary drop in WAN throughput. This is essential to minimize the performance impact on the workload in these situations. Staging area per virtual machine can be computed based on maximum change rate of workload and duration of such spike.
The following examples give sizing guidance based on internal testing:
Product version: Veritas Resiliency Platform 4.0, VMWare ESXi 6.7.0
Storage class: HDD for datastore and gateway
Network bandwidth: 1Gbps
Table: Recommended appliance configuration for customer scenarios
Workload Characteristics |
Gateway Configuration Needed |
||||
Number of Protected VMs |
Data transfer per VM (MB/s) |
Disks per VM |
Appliance virtual CPUs |
Appliance Memory (GB) |
Number of Appliance VMs per site |
20 |
10 |
4 |
4 |
16 |
1 |
10 |
20 |
4 |
4 |
16 |
1 |
31 |
2 |
4 |
4 |
16 |
1 |
31 |
5 |
5 |
4 |
16 |
2 |
NOTE: In the above table, the number of protected virtual machines is limited either by replication throughput or by maximum number of disks attached to gateway. Use of high-speed storage like SSD and network can enable to protect more virtual machines per gateway pair.