Resource Quotas, Requests, and Limits in Kubernetes (In-Depth Guide)



 

When operating Kubernetes clusters at scale, resource management becomes a critical concern.
Without proper controls, a single misconfigured application can consume excessive CPU or memory, leading to node pressure, pod evictions, degraded performance, or even cascading failures.

Kubernetes addresses this problem using three closely related concepts:

  • Resource Quotas
  • Resource Requests
  • Resource Limits

Together, they form the foundation of predictable, fair, and stable workload execution.


Why Resource Management Matters

In a shared cluster environment:

  • Multiple teams deploy workloads concurrently
  • Nodes have finite CPU and memory
  • Overcommitment leads to unpredictable behavior

Without proper boundaries:

  • Pods may starve each other of CPU
  • Memory pressure may trigger OOMKills
  • Critical workloads may be evicted
  • Node stability can be compromised

Resource quotas, requests, and limits exist to protect both the cluster and the workloads running inside it.


Resource Quotas Explained

ResourceQuotas are policies applied at the namespace level.

They define hard upper bounds on how many resources all objects combined can consume within a namespace.

What Can Be Limited

A ResourceQuota can restrict:

  • CPU requests and limits
  • Memory requests and limits
  • Number of Pods
  • Services
  • PersistentVolumeClaims
  • ConfigMaps and Secrets

Example: ResourceQuota for a Development Namespace

apiVersion: v1
kind: ResourceQuota
metadata:
  name: dev-quota
  namespace: dev
spec:
  hard:
    requests.cpu: "1000m"
    requests.memory: "2Gi"
    limits.cpu: "2000m"
    limits.memory: "4Gi"
    pods: "20"

This guarantees that all Pods in the dev namespace combined stay within these boundaries.


Requests and Limits in Detail

Requests and limits are defined at the container level inside a Pod.

If a Pod has multiple containers, each container must define its own values.

Requests

Requests represent the guaranteed minimum amount of resources required by a container.

  • Used by the Kubernetes scheduler
  • Determines Pod placement
  • Ensures sufficient capacity exists on the node

Limits

Limits represent the maximum resources a container is allowed to consume.

  • Enforced at runtime
  • CPU limits result in throttling
  • Memory limits result in container termination

Example: Container Resource Definition

resources:
  requests:
    cpu: "500m"
    memory: "1Gi"
  limits:
    cpu: "1000m"
    memory: "2Gi"

CPU Units Explained

Kubernetes measures CPU in cores or millicores.

  • 1 CPU = 1 full core
  • 500m = 0.5 CPU
  • 250m = 0.25 CPU
  • 100m = 0.1 CPU

Using millicores improves clarity for workloads consuming less than one full core.


Memory Units Explained (Expanded)

Memory in Kubernetes is always measured in bytes, but is typically expressed using binary units.

Common Memory Units

UnitNameBytes
KiKibibyte1,024
MiMebibyte1,024 Ki
GiGibibyte1,024 Mi

Why Memory Sizing Matters

Unlike CPU, memory cannot be throttled.
If a container exceeds its memory limit, it is immediately terminated by the kernel.

Kubernetes reports this as:

OOMKilled

Practical Memory Sizing Examples

Very Small Workloads

resources:
  requests:
    memory: "128Mi"
  limits:
    memory: "256Mi"

Typical use cases:

  • Sidecar containers
  • Init containers
  • CronJobs
  • Lightweight utility processes

Small to Medium Services

resources:
  requests:
    memory: "512Mi"
  limits:
    memory: "1Gi"

Typical use cases:

  • Internal APIs
  • Low-traffic microservices
  • Admin dashboards

Standard Backend Applications

resources:
  requests:
    memory: "1Gi"
  limits:
    memory: "2Gi"

Typical use cases:

  • REST APIs
  • Web backends
  • Background workers

Memory-Intensive Workloads

resources:
  requests:
    memory: "4Gi"
  limits:
    memory: "6Gi"

Typical use cases:

  • JVM-based services
  • ETL pipelines
  • Data processing jobs
  • Analytics workloads

Binary vs Decimal Units

Avoid mixing decimal (MG) and binary (MiGi) units.

  • 1Gi = 1024Mi
  • 1G = 1000M

For predictable scheduling and capacity planning, always use binary units.

Key Takeaway

Underestimating memory leads to instability.
Overestimating memory leads to wasted capacity and blocked scheduling.

Always:

  • Test under realistic load
  • Observe peak usage
  • Add a controlled safety margin

Scheduling with Resource Quotas

When scheduling a Pod into a namespace with a ResourceQuota, Kubernetes validates:

  1. Total CPU requests
  2. Total memory requests
  3. Total CPU limits
  4. Total memory limits

If any quota would be exceeded, the Pod remains in Pending state.


What Happens When Limits Are Exceeded

  • CPU limit exceeded → container is throttled
  • Memory limit exceeded → container is terminated (OOMKilled)

Mandatory Requests and Limits

When a ResourceQuota is defined, Pods must specify requests and limits.

This enforces predictability and prevents accidental overconsumption.


Best Practices

  • Base requests on measured usage
  • Use monitoring tools (kubectl top, Prometheus, Grafana)
  • Separate dev, staging, and prod namespaces
  • Review quotas periodically
  • Avoid copy-pasting values blindly

Summary

Resource Quotas, Requests, and Limits work together to:

  • Prevent resource starvation
  • Ensure fair usage
  • Improve scheduling decisions
  • Protect node and cluster stability

They are essential for running reliable Kubernetes workloads in shared environments.


Next Steps

In the next hands-on section, we will:

  • Inspect quotas using kubectl
  • Observe scheduling behavior
  • Trigger quota violations intentionally

Comments

Popular posts from this blog

Highlights from the 2025 Stack Overflow Developer Survey

Mastering Caddy Logging: A Complete Guide to Access, Error, and Structured Logs

psql: error: connection to server at "localhost" (127.0.0.1), port 5433 failed: ERROR: failed to authenticate with backend using SCRAM DETAIL: valid password not found