Any kind of interruption to your business can be catastrophic—from a power failure to a natural disaster to a security breach. Most companies these days have recovery strategies in place. I’m sure yours does. But how comprehensive is it? And have you been able to fully test it? According to a 2018 Spiceworks survey, 95% of companies polled said they did have a disaster recovery plan, but 23% also admitted that they never tested it—even though 27% of respondents also admitted to having experienced lost revenue as the result of an outage.
Most disaster plans include many procedures, so today I’d like to specifically talk about the data center component—and how the features of VMware Site Recovery Manager (SRM) can make your data center recovery more efficient and secure. If you are still working on developing your overall disaster recovery plan, take a look at Liz Alton’s post Business Continuity and Disaster Recovery Strategies for tips.
Four Disaster Recovery Misconceptions
1. If you’ve got a backup strategy, you’re all set. That’s a good starting point, of course, but you need to be aware of its limitations. You might be thinking that if you’ve got Veeam or any other backup solution, and you back up your data regularly, you should be able to execute an instant recovery. And it’s probably sufficient if you need to recover some corrupted files or restore someone’s desktop image.
But what if your entire building loses power for over 24 hours? Would your solution allow you to reboot and restore your whole infrastructure all at once? Can your backup storage handle that kind of load? Do you have hosts that can connect to the backup device quickly? Do you have enough hosts to run everything—and are they already configured to access everything necessary? How will users connect to the new environment once it’s up and running?
When deployed correctly, VMware SRM addresses each of these questions and lets you easily test each part of the solution so that your end users have an opportunity to make sure that it works as expected.
2. Disaster recovery solutions are expensive and require you to buy identical hardware for two separate sites. Actually, the requirements for an SRM solution are actually pretty simple: you need a network connection between two VMware vCenters, a host, and a datastore. They don’t have to be identical hardware or even storage—you just need to make sure that you have enough space for replication and snapshots of the source VMs and that you have enough hardware or the ability to quickly add additional hardware. SRM can even utilize and connect to a VMware cluster on AWS or any other third-party hosting solution, which means your disaster recovery site can easily be hundreds of miles away from your primary location.
3. Disaster recovery is automatic. It’s definitely not, and you really wouldn’t want it to be. For an effective disaster recovery plan, you need to come up with a set of criteria that must be met before you trigger the decision to fail over to your disaster recovery site. That way you can also decide to proactively fail over to avoid an unplanned outage in the event of a forecasted natural disaster, for example.
4. If the test goes well, an actual failover will have no issues. Unfortunately, there’s no substitute for the real thing. When you test recovery at your disaster recovery site, the test executes in a “bubble” of a certain set of VMs. But when the real thing happens, everything you have switches to the backup at the disaster recovery site, and suddenly a VM is missing the C:\Windows\Temp folder and its IP reconfigure fails. So even if your tests run smoothly, you still need to be prepared for issues that can arise from a full failover.
- It’s all integrated and gives you a fast and predictable recovery times.
- SRM can integrate directly with certain storage vendors to leverage storage level replication.
- Replicate raw disk mappings (RDM) and visualize replicated RDMs as part of the recovery plan.
- SRM can utilize snapshots at the disaster recovery site to give you a simple recover to X point in time that is defined by you.
- Return to regular operations with ease using the original recovery plan through automated failback.
- Zero-downtime application mobility: SRM can enable live migration of applications at scale between two sites when using a certified stretched solution.
- Self-service provisioning allows application tenants to provision disaster recovery protection using blueprints in VMware vRealize Automation.
How Do I Start Improving My Disaster Recovery Plan?
The first thing you should do is sit down with your stakeholders and ask the question: if all of our systems when offline, what would we do? How long would it truly take to get back up and running, and what would be the financial impact? Connection has a deep understanding on how to deploy and validate disaster recovery solutions for a variety of solutions—some of which you may already own. As a VMware partner, our data center team can work through all the steps to design, deploy, and document a solution that will give peace of mind to all stakeholders in your organization.