Obtain resources upon detection that more resources are needed for a workload

PostedDecember 20, 2024

UpdatedMarch 22, 2025

ByKevin McCaffrey

Designing for elasticity ensures that your resources scale dynamically based on current demand. This capability not only maintains performance but also minimizes costs by preventing over-provisioning. Understanding how to automatically adjust resources plays a crucial role in enhancing the reliability of your workload.

Best Practices

Implement Auto Scaling Policies

Set up Auto Scaling groups for your compute resources to automatically adjust capacity based on demand. This is crucial for ensuring that your application can handle sudden traffic spikes without becoming overloaded and compromising availability.
Use CloudWatch metrics to monitor key application performance indicators, such as CPU utilization, request counts, or latency. Configure alarms that trigger scaling actions when thresholds are breached, ensuring a proactive response to increased load.
Regularly review and adjust your scaling policies based on historical usage patterns and forecasted demand, allowing for continuous improvement in your scalability strategy.

Leverage Serverless Architectures

Consider using serverless services like AWS Lambda or Amazon DynamoDB that automatically handle scaling based on incoming requests. This approach abstracts the underlying infrastructure management, allowing you to focus on application functionality and reliability.
Take advantage of built-in redundancy and failover capabilities provided by serverless services to enhance your application’s reliability without much added complexity.
Monitor usage and optimize triggering events to manage costs effectively while maintaining scalability and reliability.

Design for Statelessness

Ensure your application components are stateless to facilitate horizontal scaling. Stateless applications can be scaled out easily by adding more instances, which maintains availability and reliability during demand spikes.
Utilize external storage solutions for state management, such as Amazon S3 or Amazon RDS, which allows your application to scale independently from its state management.
Implement session management strategies like sticky sessions judiciously to avoid bottlenecks in your application during peak traffic.

Questions to ask your team

How do you monitor your current resource utilization to detect when more resources are needed?
What automated scaling solutions are you using to respond to changes in demand?
How quickly can your system respond to increases in demand with added resources?
Are there any thresholds set for triggering resource scaling, and how are they monitored?
What strategies do you have in place for scaling down resources when demand decreases?

Who should be doing this?

Cloud Architect

Design scalable architectures that can handle fluctuations in demand.
Implement auto-scaling features in workloads to adjust resources based on real-time demand.
Monitor application performance and resource utilization to identify scaling needs.
Collaborate with development teams to ensure application readiness for dynamic scaling.

Site Reliability Engineer (SRE)

Set up monitoring and alerting systems to detect changes in demand promptly.
Analyze performance metrics to inform scaling decisions.
Conduct capacity planning and testing to ensure the system can meet peak loads.
Develop and maintain runbooks for emergency scaling actions.

DevOps Engineer

Automate deployment processes to enable swift scaling.
Integrate scaling solutions within CI/CD pipelines.
Manage cloud resource configurations and policies for automatic scaling.
Collaborate with Cloud Architects to implement best practices for resource management.

Product Owner

Define business requirements regarding scaling capabilities.
Ensure that scaling features align with user experience and service level agreements (SLAs).
Prioritize enhancements related to reliability and scalability within the product backlog.

What evidence shows this is happening in your organization?

Auto Scaling Strategy Template: A comprehensive template that outlines strategies for implementing AWS Auto Scaling, detailing the policies and triggers needed to scale resources automatically based on demand.
Reliability Dashboard: A real-time dashboard that visualizes key performance metrics and resource utilization, enabling teams to monitor workload performance and adapt resource allocation proactively.
Scalability Playbook: A playbook that provides guidelines and best practices for designing scalable architectures, including step-by-step instructions for configuring auto-scaling groups and load balancers.
Incident Response Plan: A plan that outlines processes and protocols for responding to sudden demand spikes, ensuring that workloads can scale efficiently and maintain availability during peak times.
Monitoring Checklist: A checklist that includes essential monitoring metrics and alerts necessary for detecting when additional resources are needed, ensuring proactive scaling to maintain workload performance.

Cloud Services

AWS

Amazon EC2 Auto Scaling: Automatically adjusts the number of EC2 instances up or down based on demand to ensure sufficient capacity.
Amazon Elastic Load Balancing: Distributes incoming application traffic across multiple targets, such as EC2 instances, and can scale based on current demand.
AWS Lambda: Runs code in response to events and automatically scales the execution based on the request load.

Azure

Azure Autoscale: Automatically adjusts resources, such as Virtual Machines and App Services, to match the traffic demand.
Azure Load Balancer: Distributes incoming network traffic across several servers and adjusts dynamically to incoming requests.
Azure Functions: Executes code in response to events and scales out based on demand in a serverless environment.

Google Cloud Platform

Google Compute Engine Autoscaler: Dynamically adjusts the number of VM instances in a managed instance group in response to load conditions.
Google Cloud Load Balancing: Distributes traffic across multiple instances and can automatically scale based on demand.
Google Cloud Functions: Runs code in response to events and automatically scales based on the number of requests.

Question: How do you design your workload to adapt to changes in demand?
Pillar: Reliability (Code: REL)

Operational Excellence

Determine what your priorities are

Structure your organization to support your business outcomes

Organizational culture to support your business outcomes

Implement observability in your workload

Reduce defects, ease remediation, and improve flow into production

Mitigate deployment risks

Be ready to support a workload

Uilize workload observability

Understand the health of your operations

Manage workload and operations events

Evolve your operations

Security

Securely operate your workload

Manage identities for people and machines

Manage permissions for people and machines

Detect and investigate security events

Protect your network resources

Protect your compute resources

Classify your data

Protect your data at rest

Protect your data in transit

Anticipate, respond to, and recover from incidents

Incorporate and validate the security properties of applications throughout the design, development, and deployment lifecycle

Reliability

Manage service quotas and constraints

Plan your network topology

Design your workload service architecture

Design interactions in a distributed system to prevent failures

Design interactions in a distributed system to mitigate or withstand failures

Monitor workload resources

Design your workload to adapt to changes in demand

Implement change

Back up data

Fault isolation to protect your workload

Design your workload to withstand component failures

Test reliability

Plan for disaster recovery (DR)

Cost Optimization

Implement cloud financial management

Govern usage

Monitor your cost and usage

Decommission resources

Evaluate cost when you select services

Meet cost targets when you select resource type, size and number

Use pricing models to reduce cost

Plan for data transfer charges

Manage demand, and supply resources

Evaluate new services

Evaluate the cost of effort

Performance

Select the appropriate cloud resources and architecture patterns for your workload

Select and use compute resources in your workload

Store, manage, and access data in your workload

Select and configure networking resources in your workload

Support more performance efficiency for your workload

Sustainability

Select Regions for your workload

Align cloud resources to your demand

Take advantage of software and architecture patterns to support your sustainability goals

Take advantage of data management policies and patterns to support your sustainability goals

Select and use cloud hardware and services in your architecture to support your sustainability goals

Implement organizational processes support your sustainability goals