Search for Well Architected Advice
< All Topics
Print

Integrate resiliency testing as part of your deployment

Integrating resiliency testing during the deployment process ensures that controlled changes can be made without introducing unforeseen disruptions. This approach allows teams to validate the resilience of their system against potential failures and maintain application availability even during updates or changes.

Best Practices

  • Incorporate Chaos Engineering Practices: Implement chaos engineering principles to simulate failures and assess system behavior. By proactively testing the resilience of an application in pre-production, teams can identify weaknesses and enhance the robustness of their systems before going live.
  • Automate Resiliency Tests: Integrate automated resiliency tests in the CI/CD pipeline to ensure consistency and repeatability. Automation helps catch potential failures early and reduces the manual effort required for testing.

Supporting Questions

  • Are resiliency tests routinely executed as part of the deployment process to validate system robustness?

Roles and Responsibilities

  • DevOps Engineer: Responsible for integrating testing frameworks into the deployment pipeline and ensuring that resiliency tests are executed consistently.
  • Quality Assurance Specialist: Tasked with designing and validating the effectiveness of resiliency tests to ensure coverage of potential failure scenarios.

Artifacts

  • Deployment Pipeline Configuration: Documentation detailing how the deployment pipeline is structured, including stages for resiliency testing and automated failover mechanisms.
  • Resiliency Test Plans: A collection of defined scenarios and scripts that capture the conditions under which resiliency tests are to be executed.

Cloud Services

AWS

  • AWS Fault Injection Simulator: A fully managed service to carry out chaos engineering practices, allowing teams to identify system weaknesses and evaluate the resiliency of applications efficiently.
  • AWS CodePipeline: A continuous integration and delivery service that helps automate the build, test, and deploy phases of applications, making it easier to integrate resiliency tests within deployment workflows.

Question: How do you implement change?
Pillar: Reliability (Code: REL)

Table of Contents