Resource Quotas, Requests, and Limits in Kubernetes (In-Depth Guide)
When operating Kubernetes clusters at scale, resource management becomes a critical concern.
Without proper controls, a single misconfigured application can consume excessive CPU or memory, leading to node pressure, pod evictions, degraded performance, or even cascading failures.
Kubernetes addresses this problem using three closely related concepts:
- Resource Quotas
- Resource Requests
- Resource Limits
Together, they form the foundation of predictable, fair, and stable workload execution.
Why Resource Management Matters
In a shared cluster environment:
- Multiple teams deploy workloads concurrently
- Nodes have finite CPU and memory
- Overcommitment leads to unpredictable behavior
Without proper boundaries:
- Pods may starve each other of CPU
- Memory pressure may trigger OOMKills
- Critical workloads may be evicted
- Node stability can be compromised
Resource quotas, requests, and limits exist to protect both the cluster and the workloads running inside it.
Resource Quotas Explained
ResourceQuotas are policies applied at the namespace level.
They define hard upper bounds on how many resources all objects combined can consume within a namespace.
What Can Be Limited
A ResourceQuota can restrict:
- CPU requests and limits
- Memory requests and limits
- Number of Pods
- Services
- PersistentVolumeClaims
- ConfigMaps and Secrets
Example: ResourceQuota for a Development Namespace
apiVersion: v1
kind: ResourceQuota
metadata:
name: dev-quota
namespace: dev
spec:
hard:
requests.cpu: "1000m"
requests.memory: "2Gi"
limits.cpu: "2000m"
limits.memory: "4Gi"
pods: "20"
This guarantees that all Pods in the dev namespace combined stay within these boundaries.
Requests and Limits in Detail
Requests and limits are defined at the container level inside a Pod.
If a Pod has multiple containers, each container must define its own values.
Requests
Requests represent the guaranteed minimum amount of resources required by a container.
- Used by the Kubernetes scheduler
- Determines Pod placement
- Ensures sufficient capacity exists on the node
Limits
Limits represent the maximum resources a container is allowed to consume.
- Enforced at runtime
- CPU limits result in throttling
- Memory limits result in container termination
Example: Container Resource Definition
resources:
requests:
cpu: "500m"
memory: "1Gi"
limits:
cpu: "1000m"
memory: "2Gi"
CPU Units Explained
Kubernetes measures CPU in cores or millicores.
1CPU = 1 full core500m= 0.5 CPU250m= 0.25 CPU100m= 0.1 CPU
Using millicores improves clarity for workloads consuming less than one full core.
Memory Units Explained (Expanded)
Memory in Kubernetes is always measured in bytes, but is typically expressed using binary units.
Common Memory Units
| Unit | Name | Bytes |
|---|---|---|
Ki | Kibibyte | 1,024 |
Mi | Mebibyte | 1,024 Ki |
Gi | Gibibyte | 1,024 Mi |
Why Memory Sizing Matters
Unlike CPU, memory cannot be throttled.
If a container exceeds its memory limit, it is immediately terminated by the kernel.
Kubernetes reports this as:
OOMKilled
Practical Memory Sizing Examples
Very Small Workloads
resources:
requests:
memory: "128Mi"
limits:
memory: "256Mi"
Typical use cases:
- Sidecar containers
- Init containers
- CronJobs
- Lightweight utility processes
Small to Medium Services
resources:
requests:
memory: "512Mi"
limits:
memory: "1Gi"
Typical use cases:
- Internal APIs
- Low-traffic microservices
- Admin dashboards
Standard Backend Applications
resources:
requests:
memory: "1Gi"
limits:
memory: "2Gi"
Typical use cases:
- REST APIs
- Web backends
- Background workers
Memory-Intensive Workloads
resources:
requests:
memory: "4Gi"
limits:
memory: "6Gi"
Typical use cases:
- JVM-based services
- ETL pipelines
- Data processing jobs
- Analytics workloads
Binary vs Decimal Units
Avoid mixing decimal (M, G) and binary (Mi, Gi) units.
1Gi= 1024Mi1G= 1000M
For predictable scheduling and capacity planning, always use binary units.
Key Takeaway
Underestimating memory leads to instability.
Overestimating memory leads to wasted capacity and blocked scheduling.
Always:
- Test under realistic load
- Observe peak usage
- Add a controlled safety margin
Scheduling with Resource Quotas
When scheduling a Pod into a namespace with a ResourceQuota, Kubernetes validates:
- Total CPU requests
- Total memory requests
- Total CPU limits
- Total memory limits
If any quota would be exceeded, the Pod remains in Pending state.
What Happens When Limits Are Exceeded
- CPU limit exceeded → container is throttled
- Memory limit exceeded → container is terminated (
OOMKilled)
Mandatory Requests and Limits
When a ResourceQuota is defined, Pods must specify requests and limits.
This enforces predictability and prevents accidental overconsumption.
Best Practices
- Base requests on measured usage
- Use monitoring tools (
kubectl top, Prometheus, Grafana) - Separate dev, staging, and prod namespaces
- Review quotas periodically
- Avoid copy-pasting values blindly
Summary
Resource Quotas, Requests, and Limits work together to:
- Prevent resource starvation
- Ensure fair usage
- Improve scheduling decisions
- Protect node and cluster stability
They are essential for running reliable Kubernetes workloads in shared environments.
Next Steps
In the next hands-on section, we will:
- Inspect quotas using
kubectl - Observe scheduling behavior
- Trigger quota violations intentionally
Comments
Post a Comment