Disaster recovery (DR) refers to the security planning area that aims to protect your organization from the negative effects of significant adverse events. It allows an organization to either maintain or quickly resume its mission-critical functions following a data disaster without incurring significant loses in business operations or revenues.
Disasters come in different shapes and sizes. They do not only refer to catastrophic events such as earthquakes, tornadoes, or hurricanes, but also security incidents such as equipment failures, cyber-attacks, or even terrorism classified as disasters.
In preparation, organizations and companies create DR plans detailing processes to follow and actions to take to resume their mission-critical functions.
Disaster recovery focuses on IT systems that help support an organization’s critical business functions. It is often associated with the term business continuity, but the two are not entirely interchangeable. DR is part of business continuity. It focuses more on keeping all business aspects running despite disasters.
Since IT systems have become critical to business success, disaster recovery is now a primary pillar within the business continuity process.
Most business owners do not usually consider that they may be victims of a natural disaster until an unforeseen crisis happens, which ends up costing their company a lot of money in operational and economic losses. These events can be unpredictable, and as a business owner, you cannot risk not having a disaster preparedness plan in place.
Business disasters can either be technological, natural or human-made. Examples of natural disasters include floods, tornadoes, hurricanes, landslides, earthquakes, and tsunamis. Whereas, human-made and technological disasters involve things like hazardous material spills, power or infrastructural failure, chemical and biological weapon threats, nuclear power plant blasts or meltdowns, cyberattacks, acts of terrorism, explosions, and civil unrest.
Potential disasters to plan for include:
Regardless of size or industry, when unforeseen events take place, causing daily operations to come to a halt, your company needs to recover quickly to ensure that you continue providing your services to customers and clients.
Downtime is perhaps among the biggest IT expenses that a business faces. Based on 2014-2015 disaster recovery statistics from Infrascale, one hour of downtime can cost small businesses as much as $8,000, mid-size companies $74,000, and large organizations $700,000.
For small and mid-sized businesses (SMBs), extended loss of productivity can lead to the reduction of cash flow through lost orders, late invoicing, missed delivery dates, and increased labor costs due to extra hours resulting from downtime recovery efforts.
If you do not anticipate your businesses’ major disruptions and address them appropriately, you risk incurring long-term negative consequences and implications as a result of the occurrence of unexpected disasters.
Having a DR plan in place can save your company from multiple risks, including:
As businesses become more reliant on high availability, their tolerance for downtime has decreased. Therefore, many have a DR in place to prevent adverse disaster effects from affecting their daily operations.
The two critical measurements in DR and downtime are:
Once you identify your RPO and RTO, your administrators can use the two measures to choose optimal disaster recovery strategies, procedures, and technologies.
To recover operations during tighter RTO windows, your organization needs to position its secondary data optimally to make it easily and quickly accessible. One method used to restore data quickly is recovery-in-place because it moves all backup data files to a live state, which eliminates the need to move it across a network. It can protect against server and storage system failure.
Before using recovery-in-place, your organization needs to consider three things:
Also, since recovery-in-place can sometimes take up to 15 minutes, replication may be necessary if you want a quicker recovery time. Replication refers to the periodic electronic refreshing or copying of a database from computer server A to server B, which ensures that all users in the network always share the same information level.
A disaster recovery plan refers to a structured, documented approach with instructions put in place to respond to unplanned incidents. It’s a step-by-step plan that consists of the precautions put in place to minimize a disaster’s effects so that your organization can quickly resume its mission-critical functions or continue to operate as usual.
Typically, DRP involves an in-depth analysis of all business processes and continuity needs. What’s more, before generating a detailed plan, your organization should perform a risk analysis (RA) and a business impact analysis (BIA). It should also establish its RTO and RPO.
A recovery strategy should begin at the business level, which allows you to determine the most critical applications to run your organization. Recovery strategies define your organization’s plans for responding to incidents, while DRPs describe in detail how you should respond.
When determining a recovery strategy, you should consider issues such as:
Management must approve all recovery strategies, which should align with organizational objectives and goals. Once the recovery strategies are developed and approved, you can then translate them into DRPs.
The DRP process involves a lot more than only writing the document. A business impact analysis (BIA) and risk analysis (RA) help determine areas to focus resources in the DRP process.
The BIA is useful in identifying the impacts of disruptive events, which makes it the starting point for risk identification within the DR context. It also helps generate the RTO and RPO.
The risk analysis identifies vulnerabilities and threats that could disrupt the normal operations of processes and systems highlighted in the BIA. The RA also assesses the likelihood of the occurrence of a disruptive event and helps outline its potential severity.
A DR plan checklist has the following steps:
An organization can start its DRP with a summary of all the vital action steps required and a list of essential contacts, which ensures that crucial information is easily and quickly accessible.
The plan should also define the roles and responsibilities of team members while also outlining the criteria to launch the action plan. It must then specify, in detail, the response and recovery activities. The other essential elements of a DRP template include:
A DRP can range in scope (i.e., from basic to comprehensive). Some can be upward of 100 pages.
DR budgets can vary significantly and fluctuate over time. Therefore, your organization can take advantage of any free resources available such as online DR plan templates from the Federal Emergency Management Agency. There is also a lot of free information and how-to articles online.
A DRP checklist of goals includes:
The plan should, at the very least, minimize any adverse effects on daily business operations. Your employees should also know the necessary emergency steps to follow in the event of unforeseen incidents.
Distance, though important, is often overlooked during the DRP process. A DR site located close to the primary data center is ideal in terms of convenience, cost, testing, and bandwidth. However, since outages differ in scope, a severe regional event may destroy both the primary data center and its DR site when the two are located close together.
You can tailor a DRP for a given environment.
Testing substantiates all DRPs. It identifies deficiencies in the plan and provides opportunities to fix any problems before a disaster occurs. Testing can also offer proof of the plan’s effectiveness and hits RPOs.
IT technologies and systems are continually changing. Therefore, testing ensures that your DRP is up to date.
Some reasons for not testing DRPs include budget restrictions, lack of management approval, or resource constraints. DR testing also takes time, planning, and resources. It can also be an incident risk if it involves the use of live data. However, testing is an essential part of DR planning that you should never ignore.
DR testing ranges from simple to complex:
Your organization should schedule testing in its DR policy; however, be wary of its intrusiveness. This is because testing too frequently is counter-productive and draining on your personnel. On the other hand, testing less regularly is also risky. Additionally, always test your DR plan after making any significant system changes.
To get the most out of testing:
Disaster recovery-as-a-service is a cloud-based DR method that has gained popularity over the years. This is because DRaaS lowers cost, it is easier to deploy, and allows regular testing.
Cloud testing saves your company money because they run on shared infrastructure. They are also quite flexible, allowing you to sign up for only the services you need, and you can complete your DR tests by only spinning up temporary instances.
DRaaS expectations and requirements are documented and contained in a service-level agreement (SLA). The third-party vendor then provides failover to their cloud computing environment, either on a pay-per-use basis or through a contract.
However, cloud-based DR may not be available after large-scale disasters since the DR site may not have enough room to run every user’s applications. Also, since cloud DR increases bandwidth needs, the addition of complex systems could degrade the entire network’s performance.
Perhaps the biggest disadvantage of the cloud DR is that you have little control over the process; thus, you must trust your service provider to implement the DRP in the event of an incident while meeting the defined recovery point and recovery time objectives.
Costs vary widely among vendors and can add up quickly if the vendor charges based on storage consumption or network bandwidth. Therefore, before selecting a provider, you need to conduct a thorough internal assessment to determine your DR needs.
Some questions to ask potential provider include:
A DR site allows you to recover and restore your technology infrastructure and operations when your primary data center is unavailable. These sites can be internal or external.
As an organization, you are responsible for setting up and maintaining an internal DR site. These sites are necessary for companies with aggressive RTOs and large information requirements. Some considerations to make when building your internal recovery site are hardware configuration, power maintenance, support equipment, layout design, heating and cooling, location, and staff.
Though much more expensive compared to an external site, an internal DR site allows you to control all aspects of the DR process.
External sites are owned and operated by third-party vendors. They can either be:
During the 1980s, two entities, the SHARE Technical Steering Committee and International Business Machines (IBM) came up with a tier system for describing DR Service levels. The system showed off-site recoverability with tier 0 representing the least amount and tier 6 the most.
A seventh tier was later added to include DR automation. Today, it represents the highest availability level in DR scenarios. Generally, as the ability to recover improves with each tier, so does the cost.
The preparation for a disaster is not easy. It requires a comprehensive approach that takes everything into account and encompasses software, hardware, networking equipment, connectivity, power, and testing that ensures disaster recovery is achievable within RPO and RTO targets. Although implementing a thorough and actionable DR plan is no easy task, its potential benefits are significant.
Everyone in your company must be aware of any disaster recovery plan put in place, and during implementation, effective communication is essential. It is imperative that you not only develop a DR plan but also test it, train your personnel, document everything correctly, and improve it regularly. Finally, be careful when hiring the services of any third-party vendor.