Achieving Efficient Disaster Recovery as a Service

Introducing Disaster Recovery as a Service (DRaaS)

In today's interconnected multicloud world, one topic customers are increasingly bringing up is the ability to leverage the efficiencies and scalability of public cloud providers to recover from a disaster. It's easy to see why. For every dollar spent on workload infrastructure, customers tend to spend an equal amount on business continuity. This is exacerbated by the fact that most disaster recovery (DR) infrastructure remains unused, waiting for the day everyone hopes never arrives.

When you consider its potential for massive capital savings, it's no surprise that Disaster Recovery as a Service (DRaaS) is a leading topic of interest for organizations with critical data and workloads in their data centers.

DRaaS from WWT

During a typical DRaaS engagement, we'll assess the current state of your organization's business continuity objectives, evaluate solutions and map out a strategy to ensure workloads and data remain available should a disaster hit your data center.

An effective strategy requires the classification of workloads and their data dependencies in relation to business requirements.

Understanding the business impact of application unavailability or data loss will help us define your Recovery Time and Recovery Point objectives. Put more simply, we'll help you specify the maximum acceptable time to bring services and data back online as well as the maximum acceptable data loss during recovery.

Using backups as a recovery mechanism

One common DR strategy is to leverage object storage in public cloud (AWS S3, Azure Blob, Google Cloud Platform storage) as a backup target and dynamically provision or power on static resources in the event of a disaster. This approach has the cost saving advantage of not involving standby hardware, which often goes unused. The tradeoff is that retrieving data from object storage and converting it to a native format typically requires a significant amount of time.

Additionally, the latest point-in-time recovery may be hours or days prior thanks to the retention policies typically leveraged by backup software. The maximum data loss in this scenario is the timeframe between the last successful backup and the moment failover is initiated. This may be acceptable for some workloads, but application downtime and data loss for workloads that are essential to your business translate to lost revenue.

Essential applications demand recovery objectives in seconds or minutes — not hours or days. Traditionally, this required a dedicated recovery environment with servers, storage and networking lying dormant, waiting for an event to initiate failover. Replication of data between the primary data center and secondary recovery site were synchronous (real-time) or asynchronous (near real-time).

DRaaS demo in the ATC

Check out this video where Eric Becker, WWT National Solutions Architect, explains the latest DRaaS demo capabilities available in the Advanced Technology Center (ATC).

DRaas demo by Eric Becker

This demo is based on a simple two-tier application common with many customer environments. A web frontend running in a docker container pulls data from an instance of Microsoft SQL. The production SQL database is running on a VM and stored within our data center on a NetApp ONTAP iSCSI LUN. This lets us demonstrate both the quick deployment of containerized applications as well as the failing over of an enterprise application (SQL server).

When evaluating storage platforms supporting the highest requirements of business continuity, we chose to leverage the efficiencies and scalability of NetApp ONTAP. Native replication allows customers to do fan-in, fan-out and even cascading replication relationships. The environment supports asynchronous replication intervals as low as one minute as well as options for synchronous replication.

SnapMirror and ElementOS (SolidFire) are native to ONTAP. No external software is required. This enables customers to use a cloud-based virtual instance of Data ONTAP (Cloud Volumes ONTAP), an appliance-based storage controller in a colocation facility, NetApp Hybrid Cloud Infrastructure or even a software-defined instance of Data ONTAP (ONTAP Select).

The flexibility of physical, virtual and cloud capabilities enables multiple replication and recovery options. Most importantly, NetApp's vision of a unified data fabric minimizes the amount of traffic required to failback to the production environment after a disaster is mitigated. Simply put, the data are replicated back without requiring an entire re-baseline.

Equinix ExpressRoute

Many DR architectures rely on provisioning workloads and data within public cloud infrastructure. However, an increasing number of organizations are demanding data sovereignty and the cost efficiencies and scalability of an efficient public cloud strategy.

Our DRaaS demo environment fulfills both requirements by leveraging storage in an Equinix colocation facility with VMs in Microsoft Azure waiting for an event to be powered on. Additionally, Equinix gives us low latency connectivity to Azure via ExpressRoute connectivity. In a failover event, we can power on VM and container workloads in Azure while providing a < 2ms latency to the storage in our Equinix collocation facility.

From a network perspective, the DR scenario described above is highly dependent on the Web servers in Azure public cloud being able to access the database information not located in the cloud both reliably and with very low latency.

Equinix Carrier Neutral Facilities are able to address both of these requirements. Equinix achieves this via proximity to public cloud and by using native dedicated cloud connectivity solutions. Equinix Data Centers are typically located less than < 2ms from almost all public cloud providers.

Equinix Data Centers are also able to address reliable cloud connectivity by providing Cloud Native dedicated circuits such as ExpressRoute. These circuits can be dedicated or software-defined using cloud Exchange to be brought up and used as needed.

ExpressRoute circuits automatically provide two active connections for high availability and deliver an uptime SLA of 99.99 percent. Equinix provides >99.9999 percent uptime data center reliability across its more than 180 data centers on five continents in 44 top international business markets.

NetApp automation & orchestration

Automation and orchestration of the environment are often glossed over in most DR scenarios. This includes everything from automation of day 0 provisioning to orchestration of disaster failover and failback. NetApp provides integrations and support with some of the most common automation platforms, including VMware vRealize Automation (vRA) and Ansible.

In our demo, we use Ansible Tower for everything from provisioning on-prem and cloud infrastructure to failover and failback orchestration of the environment. NetApp is one of only six Red Hat Ansible-certified partners. What this means for customers is NetApp can accelerate user deployment times by providing improved security, reliability and consistency of content.

Coupling NetApp-certified modules and Red Hat Ansible Tower orchestration provides the enterprise features and support required for multicloud disaster recovery. Certified modules cover both NetApp Hybrid Cloud Infrastructure (ElementSW) as well as NetApp ONTAP. As long as you're running a current version of ONTAP, automation of various physical FAS, AFF, ONTAP Select and NetApp Cloud Volumes ONTAP are covered by NetApp-provided Ansible modules and roles.

What we've demonstrated is that applications can easily be protected in hybrid cloud with minimal disruption in services investment in infrastructure. As is the case with traditional business continuity, the key to low recovery times will be the level of automation you can build around the process.

Book a DRaaS demo today

If you'd like to experience our DRaaS capabilities for yourself, book your lab session today or reach out to your local WWT account team.