diff --git a/docs/concepts/scaling.md b/docs/concepts/scaling.md index 022e3420a..485b9d0b7 100644 --- a/docs/concepts/scaling.md +++ b/docs/concepts/scaling.md @@ -9,7 +9,6 @@ Scaling is the process of adjusting the number of instances (or replicas) of a s Scaling is a core concept in distributed systems and cloud-native applications. It ensures your system can handle varying workloads without degrading user experience or over-provisioning resources. - ## Why Scale? Scaling enables services to respond effectively under different conditions: @@ -32,7 +31,7 @@ In most modern deployments, horizontal scaling is preferred because it aligns we **Auto-scaling** refers to automatically adjusting the number of service instances based on defined policies or metrics. -Instead of manually adding more instances when traffic increases, an auto-scaling system watches key indicators (like CPU usage) and takes action in real time. +Instead of manually adding more instances when traffic increases, an auto-scaling system watches key indicators (like CPU usage) and takes action in real time. Defang autoscaling will scale up to twice (2x) the minimum replica count. ### Example @@ -63,6 +62,7 @@ Auto-scaling systems typically rely on: - **Scaling Policies**: Rules that define when to scale up or down. For example: - If average CPU > 85% for 5 minutes → scale up by 2 instances. - **Cooldown Periods**: Delays between scaling events to prevent rapid, repeated changes (flapping). +- **Max Replicas**: The maximum number of replicas is set to twice (2x) the minimum replica count per service. ### Supported Providers