Search for Well Architected Advice
< All Topics
Print

Use defined recovery strategies to meet the recovery objectives

Defining a Disaster Recovery (DR) strategy that aligns with your workload’s recovery objectives is essential for ensuring business continuity. This process involves selecting an appropriate recovery strategy, such as backup and restore, standby (active/passive), or active/active, based on specific business requirements.

Best Practices

  • Identify RTO and RPO: Establish clear Recovery Time Objective (RTO) and Recovery Point Objective (RPO) to define how quickly systems need to be restored and how much data loss is acceptable. Assess business priorities to set these metrics appropriately.
  • Choose the Right DR Strategy: Select a DR strategy that fits your workload’s needs. For instance, backup and restore is less costly but has longer recovery times, while active/active offers higher availability with increased complexity.
  • Test DR Plans Regularly: Implement regular testing of your DR strategy to ensure its effectiveness when needed. Document results and make adjustments based on findings to optimize recovery processes.

Supporting Questions

  • Have you clearly defined your RTO and RPO metrics for your workloads?
  • What backup/deduplication methods are currently in place?
  • When was the last time you conducted a DR test?

Roles and Responsibilities

  • Disaster Recovery Manager: Responsible for developing and implementing the DR strategy. Ensures that recovery objectives are met and coordinates testing activities.
  • System Administrator: Oversees backup operations and ensures that recovery processes are in place and functioning as intended.
  • Business Continuity Manager: Prioritizes business needs and helps to align the DR strategy with organizational goals.

Artifacts

  • Disaster Recovery Plan Document: A comprehensive document outlining the DR strategy, including RTOs, RPOs, and operational procedures for restoration.
  • DR Test Report: Documentation of DR testing outcomes that provides insights into the effectiveness of recovery strategies, including areas for improvement.

Cloud Services

AWS

  • Amazon S3: Provides scalable storage for backup solutions, allowing you to easily store and retrieve data for recovery processes.
  • AWS Backup: Centralizes your backup management across AWS services, making it simpler to automate backup processes and meet your RTO/RPO.
  • AWS Elastic Load Balancing (ELB): Distributes incoming traffic across multiple targets, helping to implement an active/passive or active/active DR strategy.
  • Amazon Route 53: Offers DNS routing capabilities that can aid in failover strategies, directing traffic to healthy endpoints during outages.

Question: How do you plan for disaster recovery (DR)?
Pillar: Reliability (Code: REL)

Table of Contents