Search for Well Architected Advice
< All Topics
Print

Ensure a consistent review of operational readiness

Ensuring Consistent Review of Operational Readiness
Operational Readiness Reviews (ORRs) are crucial for validating that a workload can be safely operated in production. ORRs ensure that all processes, people, and systems are ready to maintain and support the workload, reducing the risk of failures or unplanned downtime. By conducting consistent ORRs, teams can certify their workloads for operational stability and identify areas for improvement before deployment.

Conduct Operational Readiness Reviews (ORRs)

Use ORRs to validate that your workload is ready for production. ORRs involve a detailed review and inspection process using a checklist of requirements, covering areas such as monitoring, incident response, backup, and recovery. An ORR helps ensure that all components of the workload are properly configured, maintained, and supported.

Use ORRs for Self-Service Certification

Operational Readiness Reviews are designed to be a self-service experience that teams use to certify their workloads. Teams use a standardized checklist to assess their readiness and ensure that best practices are followed. This empowers teams to take ownership of their workloads and ensure they meet the required standards before going live.

Checklist-Based Validation

ORRs utilize a checklist of requirements to ensure consistency and completeness in evaluating readiness. The checklist includes items such as monitoring capabilities, incident management protocols, security controls, and performance testing. By using a checklist, teams can systematically review all critical aspects of the workload, identify gaps, and implement necessary improvements.

Incorporate Best Practices and Lessons Learned

ORRs incorporate best practices and lessons learned from years of building and operating software at Amazon. These best practices include insights into handling incidents, scaling workloads, securing systems, and maintaining high availability. Using this knowledge, teams can improve their operational processes and be better prepared to handle potential issues.

Validate Safety and Stability

The goal of an ORR is to validate that the workload can be operated safely and reliably. This includes ensuring that all dependencies are functioning correctly, that there are adequate monitoring and alerting mechanisms in place, and that the team is prepared to respond to any incidents. By consistently conducting ORRs, teams can reduce the risk of operational issues and ensure stable workload operation.

Supporting Questions

  • What is the process for conducting an Operational Readiness Review (ORR)?
  • How does an ORR help validate the safety and stability of your workload?
  • What best practices are included in the ORR checklist to ensure operational readiness?

Roles and Responsibilities

Operations Manager
Responsibilities:

  • Conduct Operational Readiness Reviews (ORRs) to validate the operational readiness of workloads.
  • Ensure that all checklist items are completed and that any identified gaps are addressed before deployment.

DevOps Engineer
Responsibilities:

  • Participate in ORRs to verify that all infrastructure components are configured and monitored correctly.
  • Ensure that any automation scripts, monitoring tools, or configuration management settings meet the requirements of the ORR checklist.

Service Owner
Responsibilities:

  • Take ownership of the workload and ensure that it meets all requirements outlined in the ORR checklist.
  • Implement best practices and lessons learned from previous ORRs to improve workload readiness.

Artifacts

Relevant AWS Tools

Review and Certification Tools

  • AWS Well-Architected Tool: Provides best practices and guidance for reviewing workload readiness and helps teams understand potential gaps in their workload architecture.
  • AWS Trusted Advisor: Checks the readiness of AWS resources and provides recommendations on security, performance, cost optimization, and fault tolerance, which can be used as part of an ORR.

Monitoring and Alerting Tools

  • Amazon CloudWatch: Monitors the health and performance of workloads, providing metrics and alerts to validate operational readiness.
  • AWS Systems Manager: Automates compliance checks and operational reviews to validate readiness as part of the ORR process.

Incident Management Tools

  • AWS Systems Manager Incident Manager: Supports incident response capabilities, ensuring that teams are ready to respond to issues identified during the ORR.
  • AWS Config: Tracks configuration changes and validates that the current configurations meet readiness requirements, helping identify gaps during ORRs.
Table of Contents