For all the attention enterprise leaders give to IT service resiliency in their own data centres, too little emphasis is placed on risk mitigation for workloads running on public cloud platforms. According to a recent Veritas study, 91% of enterprises have a multi-cloud strategy in place, highlighting that the use of multiple clouds is becoming the reality for the majority of businesses.
The public cloud has attracted strong growth among enterprise users due to a number of highly-valued features: increased agility, near-unlimited scale, high levels of security and lower costs. In addition to these benefits, public cloud platforms are highly resilient. However, outages in public cloud do happen and organisations need to take responsibility for cloud disaster recovery planning (DRP) if they want to use these platforms for mission critical workloads. With increasing reliance on public cloud services for business-critical operations, executives need to be clear on which capabilities are provided as part of their Cloud Service Agreements and which remain their own responsibility.
DRP is one important area for which the customer is accountable. This requires a strategy for implementing, maintaining, and testing technology and procedures that ensure resiliency in the event of a cloud outage or incident as a part of standard enterprise data management practices. In this blog I will explain what DRP is in a multi-cloud context and explore some common myths that are holding organisations back from putting proper protections in place. In part two I will share some steps you can take to build your disaster recovery (DR) plan.
DRP refers to measures taken to enable an organisation to recover its IT services in a timely manner in the event of a major outage, whatever the cause. With the increase in ransomware attacks, the likelihood of IT service downtime has increased across all platforms, including public cloud. With an actionable DR plan and set of procedures in place, the organisation can either maintain or quickly resume mission-critical functions following a major incident without incurring significant losses in business operations or revenues.
Organisations will typically have clearly documented procedures to deal with major incidents that take place in their own facilities, such as a fire or a flood in a data centre. This provides management and, implicitly, employees and customers the assurance that if a major outage occurred, operations could be resumed quickly and with minimal service interruption.
However, it is often the case that when organisations are using public cloud for operations of similar importance, little consideration is given to put the same assurances are in place. This can result in significant operational risks, as business-critical services are moved from data centres covered by strong DR plans to public cloud environments with no such plans in place.
While the narrative has become commonplace, the notion that moving to public cloud gives you in-built resiliency simply does not hold true. If you are to plan appropriately for multi-cloud DR, there are several prevalent myths that first need to be dispelled. Let us explore some of these common misconceptions.
Myth – Public cloud platforms have built-in resiliency, so workloads running on them will not suffer outages.
Reality – While the major cloud service providers (CSPs) offer cloud platforms with very high degree of resilience, they can and do suffer outages. In 2022, one the largest CSPs had over 25 documented outages on their platforms globally. And the CSPs typically offer an SLA of 99.9% uptime for a single virtual machine. While this is very good – and probably better than most organisations can deliver in their own data centres – it still represents downtime of around 9 hours a year. And this is just for a single component – in a complex, multi-server environment, the risk of an outage is compounded.
Myth – Cloud applications built using modern, scale-out architectures have built-in resiliency and will not suffer outages.
Reality – It is true that modern, scale-out applications can be designed with built-in resiliency to certain failure conditions. For example, processing in a scale-out database such as MongoDB can be distributed across multiple nodes, and the loss of any one node will not result in an outage. However, only a relatively small portion of workloads that enterprises are running in public cloud today are designed this way. Many are simply legacy applications that have been “lifted and shifted” to the cloud. Such applications are just as vulnerable to downtime in the cloud as they are in your data centre. And even if an application has been architected to withstand the loss of various components, few have built-in tolerance for a major outage, such as the loss of a complete availability zone.
Myth – Cloud providers spend billions each year on security and are immune to ransomware incidents.
Reality – The increased dependency on public cloud during lockdown has made these platforms a primary target for ransomware. Ransomware attacks increased by 105% globally in 2021. While Public Cloud can offer high levels of information security compared to enterprises’ own data centres, many workloads running in the cloud are, by necessity, internet-facing, and therefore the risk of being exploited by hackers remains. Consequently, data can be encrypted, leading to IT service outages.
As these cloud myths highlight, it is tempting to defer responsibility for IT resiliency to the cloud service provider. In practice, companies may find that in moments of stress this illusory safety net quickly disappears. This makes DRP for all data and workloads running in the cloud a crucial exercise. In part two I will outline the components of a robust plan for multi-cloud disaster recovery and explain how you can align your business to put the right protection in place.
In the meantime you can find out more about the approach and principles you can apply to prepare for unexpected situations, read our comprehensive Multi-Cloud Disaster Recovery guide.