Review metrics at regular intervals

PostedDecember 20, 2024

UpdatedMarch 21, 2025

ByKevin McCaffrey

Reviewing metrics regularly is crucial for maintaining and improving the performance of your workloads in the cloud. By analyzing metrics, teams can identify performance bottlenecks, understand system behavior, and make informed decisions to optimize resources, ensuring efficient resource utilization.

Best Practices

Regular Metric Review and Optimization

Establish a metric collection strategy: Identify which metrics are critical to your workload’s performance and set clear goals for what you want to measure.
Schedule regular reviews: Set a consistent schedule (e.g., weekly or monthly) to review collected metrics to ensure timely identification of performance bottlenecks or issues.
Use monitoring tools: Leverage AWS services like CloudWatch and X-Ray to collect, visualize, and analyze metrics effectively.
Prioritize metrics: Focus on key performance indicators (KPIs) that directly impact workload efficiency, allowing for more effective resource allocation.
Continuous adjustment: Use the insights gained from metrics reviews to make iterative adjustments to your architecture, optimizing for performance as workloads evolve.
Documentation: Maintain documentation of metrics reviewed, changes made based on insights, and the impact of those changes on performance.

Questions to ask your team

What metrics are currently being monitored for performance efficiency?
How often do you review these performance metrics?
Have you identified any metrics that are no longer valuable or relevant?
What processes do you have in place for responding to anomalies in performance metrics?
Are there additional metrics you believe could enhance the understanding of your workload’s performance?
How do you determine which metrics are essential for performance optimization?
Do you have a schedule for regular metric reviews, and who is responsible for this process?

Who should be doing this?

Cloud Architect

Design and implement performance-efficient architectures.
Select appropriate services and instance types based on performance metrics.
Review collected metrics regularly to ensure relevance and efficacy.

DevOps Engineer

Set up and maintain monitoring tools for real-time performance metrics.
Automate the collection and reporting of metrics.
Respond to incidents by analyzing performance metrics and trends.

System Administrator

Manage metrics collection systems and ensure data availability.
Conduct regular audits of metrics being collected to eliminate redundancy.
Provide recommendations based on metric analysis to optimize system performance.

Data Analyst

Analyze performance data to identify trends and potential improvements.
Create reports that summarize findings from metric reviews for stakeholders.
Collaborate with other roles to address issues and enhance performance.

Product Owner

Prioritize performance efficiency improvements based on metric insights.
Communicate findings and recommendations to stakeholders and teams.
Ensure alignment of performance goals with business objectives.

What evidence shows this is happening in your organization?

Performance Metrics Review Template: A structured template to guide the team in reviewing collected performance metrics regularly. This template helps identify essential metrics for workload performance and documents findings for future reference.
Monthly Performance Review Report: A comprehensive report generated monthly that analyzes performance metrics. This report highlights trends, identifies potential issues, and proposes actions for improvement in performance efficiency.
Incident Response Playbook: A playbook outlining the process for responding to performance-related incidents, including steps to review relevant metrics, assess performance issues, and implement corrective actions.
Performance Metrics Dashboard: An interactive dashboard that visualizes key performance metrics in real-time. This tool enables stakeholders to monitor performance continuously and make informed decisions based on the latest data.
Performance Optimization Checklist: A checklist designed for teams to ensure all necessary performance metrics are monitored and reviewed regularly. It includes key questions and areas to assess to maintain optimal performance efficiency.

Cloud Services

AWS

Amazon CloudWatch: CloudWatch monitors your AWS resources and applications in real-time, allowing you to collect and track metrics, collect log files, and set alarms to gain visibility into your system’s performance.
AWS Auto Scaling: Auto Scaling automatically adjusts the number of Amazon EC2 instances in your application to maintain performance and control costs, based on metrics that you define.
AWS Lambda: Lambda runs your code in response to events and automatically manages the compute resources for you. It can help improve performance efficiency by optimizing resource usage.

Azure

Azure Monitor: Azure Monitor collects and analyzes performance metrics and log data from applications and resources, helping you to optimize performance and improve resource utilization.
Azure Advisor: Azure Advisor provides personalized best practices recommendations that can help you optimize your Azure resources for performance, cost, and security.
Azure Functions: Azure Functions allows you to run event-driven code without having to explicitly provision or manage infrastructure, which can enhance performance efficiency.

Google Cloud Platform

Google Cloud Monitoring: Cloud Monitoring provides visibility into the performance, uptime, and overall health of your applications in the Google Cloud, helping you analyze metrics and logs effectively.
Google Cloud Performance Dashboard: The Performance Dashboard offers insights into the performance of your Google Cloud services, helping you identify bottlenecks and optimize resource usage.
Google Cloud Functions: Cloud Functions allows you to run your code in response to events without provisioning servers, making it easier to maintain performance efficiency.

Question: What process do you use to support more performance efficiency for your workload?
Pillar: Performance Efficiency (Code: PERF)

Operational Excellence

Determine what your priorities are

Structure your organization to support your business outcomes

Organizational culture to support your business outcomes

Implement observability in your workload

Reduce defects, ease remediation, and improve flow into production

Mitigate deployment risks

Be ready to support a workload

Uilize workload observability

Understand the health of your operations

Manage workload and operations events

Evolve your operations

Security

Securely operate your workload

Manage identities for people and machines

Manage permissions for people and machines

Detect and investigate security events

Protect your network resources

Protect your compute resources

Classify your data

Protect your data at rest

Protect your data in transit

Anticipate, respond to, and recover from incidents

Incorporate and validate the security properties of applications throughout the design, development, and deployment lifecycle

Reliability

Manage service quotas and constraints

Plan your network topology

Design your workload service architecture

Design interactions in a distributed system to prevent failures

Design interactions in a distributed system to mitigate or withstand failures

Monitor workload resources

Design your workload to adapt to changes in demand

Implement change

Back up data

Fault isolation to protect your workload

Design your workload to withstand component failures

Test reliability

Plan for disaster recovery (DR)

Cost Optimization

Implement cloud financial management

Govern usage

Monitor your cost and usage

Decommission resources

Evaluate cost when you select services

Meet cost targets when you select resource type, size and number

Use pricing models to reduce cost

Plan for data transfer charges

Manage demand, and supply resources

Evaluate new services

Evaluate the cost of effort

Performance

Select the appropriate cloud resources and architecture patterns for your workload

Select and use compute resources in your workload

Store, manage, and access data in your workload

Select and configure networking resources in your workload

Support more performance efficiency for your workload

Sustainability

Select Regions for your workload

Align cloud resources to your demand

Take advantage of software and architecture patterns to support your sustainability goals

Take advantage of data management policies and patterns to support your sustainability goals

Select and use cloud hardware and services in your architecture to support your sustainability goals

Implement organizational processes support your sustainability goals