Resource Quotas, Requests, and Limits in Kubernetes (In-Depth Guide)

January 15, 2026

When operating Kubernetes clusters at scale, resource management becomes a critical concern.
Without proper controls, a single misconfigured application can consume excessive CPU or memory, leading to node pressure, pod evictions, degraded performance, or even cascading failures.

Kubernetes addresses this problem using three closely related concepts:

Resource Quotas
Resource Requests
Resource Limits

Together, they form the foundation of predictable, fair, and stable workload execution.

Why Resource Management Matters

In a shared cluster environment:

Multiple teams deploy workloads concurrently
Nodes have finite CPU and memory
Overcommitment leads to unpredictable behavior

Without proper boundaries:

Pods may starve each other of CPU
Memory pressure may trigger OOMKills
Critical workloads may be evicted
Node stability can be compromised

Resource quotas, requests, and limits exist to protect both the cluster and the workloads running inside it.

Resource Quotas Explained

ResourceQuotas are policies applied at the namespace level.

They define hard upper bounds on how many resources all objects combined can consume within a namespace.

What Can Be Limited

A ResourceQuota can restrict:

CPU requests and limits
Memory requests and limits
Number of Pods
Services
PersistentVolumeClaims
ConfigMaps and Secrets

Example: ResourceQuota for a Development Namespace

apiVersion: v1
kind: ResourceQuota
metadata:
  name: dev-quota
  namespace: dev
spec:
  hard:
    requests.cpu: "1000m"
    requests.memory: "2Gi"
    limits.cpu: "2000m"
    limits.memory: "4Gi"
    pods: "20"

This guarantees that all Pods in the dev namespace combined stay within these boundaries.

Requests and Limits in Detail

Requests and limits are defined at the container level inside a Pod.

If a Pod has multiple containers, each container must define its own values.

Requests

Requests represent the guaranteed minimum amount of resources required by a container.

Used by the Kubernetes scheduler
Determines Pod placement
Ensures sufficient capacity exists on the node

Limits

Limits represent the maximum resources a container is allowed to consume.

Enforced at runtime
CPU limits result in throttling
Memory limits result in container termination

Example: Container Resource Definition

resources:
  requests:
    cpu: "500m"
    memory: "1Gi"
  limits:
    cpu: "1000m"
    memory: "2Gi"

CPU Units Explained

Kubernetes measures CPU in cores or millicores.

1 CPU = 1 full core
500m = 0.5 CPU
250m = 0.25 CPU
100m = 0.1 CPU

Using millicores improves clarity for workloads consuming less than one full core.

Memory Units Explained (Expanded)

Memory in Kubernetes is always measured in bytes, but is typically expressed using binary units.

Common Memory Units

Unit	Name	Bytes
`Ki`	Kibibyte	1,024
`Mi`	Mebibyte	1,024 Ki
`Gi`	Gibibyte	1,024 Mi

Why Memory Sizing Matters

Unlike CPU, memory cannot be throttled.
If a container exceeds its memory limit, it is immediately terminated by the kernel.

Kubernetes reports this as:

OOMKilled

Practical Memory Sizing Examples

Very Small Workloads

resources:
  requests:
    memory: "128Mi"
  limits:
    memory: "256Mi"

Typical use cases:

Sidecar containers
Init containers
CronJobs
Lightweight utility processes

Small to Medium Services

resources:
  requests:
    memory: "512Mi"
  limits:
    memory: "1Gi"

Typical use cases:

Internal APIs
Low-traffic microservices
Admin dashboards

Standard Backend Applications

resources:
  requests:
    memory: "1Gi"
  limits:
    memory: "2Gi"

Typical use cases:

REST APIs
Web backends
Background workers

Memory-Intensive Workloads

resources:
  requests:
    memory: "4Gi"
  limits:
    memory: "6Gi"

Typical use cases:

JVM-based services
ETL pipelines
Data processing jobs
Analytics workloads

Binary vs Decimal Units

Avoid mixing decimal (M, G) and binary (Mi, Gi) units.

1Gi = 1024Mi
1G = 1000M

For predictable scheduling and capacity planning, always use binary units.

Key Takeaway

Underestimating memory leads to instability.
Overestimating memory leads to wasted capacity and blocked scheduling.

Always:

Test under realistic load
Observe peak usage
Add a controlled safety margin

Scheduling with Resource Quotas

When scheduling a Pod into a namespace with a ResourceQuota, Kubernetes validates:

Total CPU requests
Total memory requests
Total CPU limits
Total memory limits

If any quota would be exceeded, the Pod remains in Pending state.

What Happens When Limits Are Exceeded

CPU limit exceeded → container is throttled
Memory limit exceeded → container is terminated (OOMKilled)

Mandatory Requests and Limits

When a ResourceQuota is defined, Pods must specify requests and limits.

This enforces predictability and prevents accidental overconsumption.

Best Practices

Base requests on measured usage
Use monitoring tools (kubectl top, Prometheus, Grafana)
Separate dev, staging, and prod namespaces
Review quotas periodically
Avoid copy-pasting values blindly

Summary

Resource Quotas, Requests, and Limits work together to:

Prevent resource starvation
Ensure fair usage
Improve scheduling decisions
Protect node and cluster stability

They are essential for running reliable Kubernetes workloads in shared environments.

Next Steps

In the next hands-on section, we will:

Inspect quotas using kubectl
Observe scheduling behavior
Trigger quota violations intentionally

Resource Quotas, Requests, and Limits in Kubernetes (In-Depth Guide)

Why Resource Management Matters

Resource Quotas Explained

What Can Be Limited

Example: ResourceQuota for a Development Namespace

Requests and Limits in Detail

Requests

Limits

Example: Container Resource Definition

CPU Units Explained

Memory Units Explained (Expanded)

Common Memory Units

Why Memory Sizing Matters

Practical Memory Sizing Examples

Very Small Workloads

Small to Medium Services

Standard Backend Applications

Memory-Intensive Workloads

Binary vs Decimal Units

Key Takeaway

Scheduling with Resource Quotas

What Happens When Limits Are Exceeded

Mandatory Requests and Limits

Best Practices

Summary

Next Steps

Post a Comment

#buttons=(Ok, Go it!) #days=(20)

Contact form

Resource Quotas, Requests, and Limits in Kubernetes (In-Depth Guide)

Why Resource Management Matters

Resource Quotas Explained

What Can Be Limited

Example: ResourceQuota for a Development Namespace

Requests and Limits in Detail

Requests

Limits

Example: Container Resource Definition

CPU Units Explained

Memory Units Explained (Expanded)

Common Memory Units

Why Memory Sizing Matters

Practical Memory Sizing Examples

Very Small Workloads

Small to Medium Services

Standard Backend Applications

Memory-Intensive Workloads

Binary vs Decimal Units

Key Takeaway

Scheduling with Resource Quotas

What Happens When Limits Are Exceeded

Mandatory Requests and Limits

Best Practices

Summary

Next Steps

You Might Like

Post a Comment

#buttons=(Ok, Go it!) #days=(20)

Contact form