In my last post, I shared some sobering numbers from a recent study by the DRP Council on how well – and not well – organizations recover from disruptions. Many of the problems revealed in the study can, I believe, be attributed to four causes:
- Inadequate recovery plans that don’t anticipate the types of events that actually occur
- Insufficient plan documentation and lack of compliance reporting
- Not nearly enough recovery plan testing
- Failing the recovery tests that do occur
All of this is eminently understandable – it’s hard to focus budget and time on what we prefer to regard as unlikely possibilities.
So here’s my first recovery best practice: think of your recovery plan as the best way to keep those possibilities unlikely, because when they do happen, they cost plenty.
I also encourage our clients to embrace a number of more specific best practices:
- Build and implement more detailed recovery plans
Effective Disaster Recovery (DR) plans identify critical applications and components essential for recovery. Everything you need to recover — data, apps, systems, services, networks, document repositories, etc. — should be part of this plan. It should also carefully specify failover/fallback processes. And all of this should be thoroughly documented.
- Stipulate RTO and RPO metrics
A successful recovery plan needs to define specific and realistic Recovery Time Objectives (RTOs) and Recovery Point Objectives (RPOs) for every mission-critical business service (e.g., email, customer orders, accounting functions, etc.).
- Update recovery plans whenever configurations change
New systems, software, changes to data centers: it all means adjustments to your recovery plan, followed by testing of the revised recovery plan.
- Automate recovery testing
The only way you can know if your recovery plan will work is to try it out. Often. This means testing RTOs and RPOs with planned failures. It means practicing your plan and then refining and improving it based on your results. It’s also important to document testing processes and results — and share them, creating a feedback loop and motivation to improve. Recovery testing is best achieved with automation, which can be relatively inexpensive in virtualized infrastructures and secondary sites (especially DRaaS-based sites) with hardware and/or software replication or mirroring and established standby failover policies. When testing of critical IT applications and components is automated, it’s much easier, and more affordable, to do it more frequently.
- Don’t hesitate to get expert help
It takes expertise to ferret out the data, apps, services, and systems that matter to your business (specify their RTOs and RPOs), conduct recovery tests, and adapt to ongoing changes. An independent technology advisor with proven recovery expertise in virtualized and cloud-based environments can help you build, and regularly test, an automated recovery plan that works without keeping you up at night.