There’s a widespread impression held by many IT pros today about the Azure Blob Storage redundancy levels, and how merely selecting the correct redundancy level provides a backup of data stored in Azure Blob Storage.
The belief is that by selecting Geo-Redundant Storage (GRS) as the replication level, one will achieve a geographically-segregated backup copy of the data.
To be blunt, that’s an extremely dangerous misconception.
Let me be clear:
None of the Azure Blob Storage replication levels, Read Access Geo-Redundant Storage (RA-GRS) included, give you backup. Replication levels are provided only as a way of increasing resiliency of the solution and availability of data.
There are several key attributes of backup which are entirely missing in the replication models provided in Azure (and other clouds); this post will discuss those in detail, and then offer a simple solution for Azure Blob Storage backup.
Table of Contents:
One of the critical components of a backup is that even if, and especially when, a blob is changed or deleted, the blob is still recoverable, as it existed at a desired previous point in time.
Think about it: With GRS, you get no such capability, since any modifications or deletions are immediately replicated to the secondary region.
If the accidental or malicious deletion of an object happens, GRS provides absolutely no recovery option, since all changes (and, in this case, deletion counts as a change) are replicated to all copies, including those in the secondary region. On this basis alone GRS, and any other replication level provided by Azure Blob Storage, all fail to meet the most basic requirement of a backup.
The bottom line: replication is not a real backup!
Thinking at a higher level than the individual blobs, any backup solution for Azure Blob Storage would need to protect against any deletion, accidental or otherwise, of a container, or an entire storage account.
Once again, none of the replication levels, GRS included, provide any such protection: If you delete a container, or the entire storage account, all copies of the affected blobs in all regions are deleted.
So, the simple act of a container deletion (accidentally or maliciously), an operation that only takes a couple of clicks in the Azure portal, can result in permanent data loss.
In general, when dealing with backup, the goal is to have the maximum retention and granularity for recovery at the lowest possible cost.
We’ve already talked about how replication levels like GRS do not provide any guaranteed retention and lack point-in-time recovery. But more than that, replication with Azure Blob Storage GRS is expensive. Storing blobs with GRS will more than double the cost since you have to pay the equivalent of LRS storage twice. Storage operation costs increase also. And you incur a geo-replication bandwidth cost overhead too.
In total, this can add up when we’re talking about even modest data volumes. Consider the additional expense of storing 500 TB on the Cool tier using GRS data for three years:
As you can see, the cost of enabling GRS or RA-GRS replication is a significant jump, especially when you consider this much higher storage cost is just getting you replication, NOT a dependable backup!
Now, what if I told you that there was a way to achieve an actual point-in-time backup of the same 500 TB for three years, at an additional cost of only $70K, versus the $335K premium for GRS? If you’re interested, keep reading because we’ll delve into this at the end of the post.
Another issue with the way Azure blob storage GRS replication works is that you have no control over which region your data will be replicated. Microsoft predetermines the Azure regional pairing for storage replication, meaning that blobs in a given Azure region will always go to the paired region when using GRS or RA-GRS. You can view the region pairings here.
So, what happens if you want to store data using GRS in a region other than the default paired region? Unfortunately, you are entirely out of luck, and you cannot adjust the replication region with GRS or RA-GRS options.
In most cases, I don’t think this fixed pairing is a huge deal, but it can be in some scenarios. It really depends on what you want to protect against.
Take Brazil, for example. The Azure region of ‘Brazil South’ is in Sao Paolo State, and its regional pair for storage replication is South Central US which is in Texas. An organization in Brazil may not want their data storage in the US, perhaps either due to the PATRIOT Act or because of a requirement to maintain a safe copy of their data outside of the Northern hemisphere.
In many of the other regional pairings, the distance between the two Azure regions may be less than desirable. If you are paranoid about your valuable data, then presumably, you are worried about a real region-wide failure event. In this case you probably want more distance between your primary and remote copy than some of the default pairings.
With the GRS replication level, you are entirely at Microsoft’s mercy for when a fail-over is performed. In the event of a temporary regional outage, Microsoft will not perform the fail-over, meaning that your data is inaccessible during the blackout (even for reads)!
Now, you can use RA-GRS to provide read accessibility to your data, but this, as we showed above, is prohibitively expensive in most cases, and comes with a slew of restrictions and caveats.
So, when does Microsoft perform a fail-over? Virtually never, given the complexity and cost involved. Instead, Microsoft allows temporary outages to take data offline (even GRS data) for the duration of the outage. In severe circumstances, after making attempts to recover the primary storage account, Microsoft may elect to open up the replica storage account for you, but it is entirely at their discretion when this happens.
Even if one was willing to pay exorbitant costs for a non-backup copy of data, as is the case in attempting to use GRS for backup, one would at least hope to achieve a clear RPO (recovery point objective) and know how long it takes for data to replicate and be proactively informed of any issues.
With GRS, this is also not the case. While Microsoft states a GRS replication RPO 15 minutes, they are careful not to commit to that, meaning you have no guarantees about how long it will take for your data to replicate.
And, because of how GRS obfuscates the replication process, you have no way of monitoring or knowing when replication stalls or fails.
Think about any true backup solution: One of the critical aspects is that you have some capability to know precisely how up-to-date your backup is, and control the process and granularity as needed, receiving automated real-time notifications of any issues.
Given that Azure Blob Storage GRS or RA-GRS options are in no way, shape, or form a backup, and also that they are costly, I am happy to report that there is a simple solution for Azure Blob Storage backup – Veritas Alta™ SaaS Protection.
This is Veritas’ market-leading cloud data platform built on Microsoft Azure. With Veritas Alta SaaS Protection, you can back up any blob storage account into the region and tier of your choice.
The backup copy can be moved to the Azure Archive tier if you want to make it very affordable. You also benefit from the compression and deduplication functionality built into Veritas Alta SaaS Protection, which reduces the volume of data in the backup.
Let’s augment our Azure Blob Storage cost example above. Let’s look at the cost of backing up the 500 TB of Cool tier blobs for three years with the Veritas solution. For this example, we assume a modest 30% reduction due to compression and deduplication with the backup copy on the Archive tier:
As you can see, adding true backup with the ability for point-in-time recovery costs only slightly more than LRS alone, and is significantly less costly than GRS which doesn’t provide true backup functionality.