Prevent Pods Scheduling on Same Node

⚠️

SECTION 01

The Problem — Why This Matters

When you scale a Deployment to multiple replicas, Kubernetes decides where each pod lands. Without any constraints, the scheduler may place all replicas on the same node — repeatedly.

⚖️ Poor Resource Distribution

One node gets overloaded while others sit idle. CPU and memory are wasted across the cluster.

❌ Lower Fault Tolerance

If all pods are on one node, a node failure means zero replicas survive. Your app goes down entirely.

💥 Node Failure = Full Outage

The whole point of multiple replicas is HA. Co-locating them completely defeats that purpose.

🔴

Real scenario You have a 3-replica deployment. Kubernetes schedules all 3 on Node1. Node1 goes down for maintenance. Your service is now fully unavailable, even though Node2 and Node3 are perfectly healthy.

💡

SECTION 02

Core Concepts

Understanding these four building blocks makes the YAML configuration obvious rather than mysterious.

🔑 topologyKey

The node label used to define "groups." Use kubernetes.io/hostname to spread per node, or topology.kubernetes.io/zone to spread across availability zones.

📐 maxSkew

Maximum allowed difference in pod count between the most-loaded and least-loaded group. maxSkew: 1 enforces near-perfect balance.

🏷️ labelSelector

Tells Kubernetes which pods to count when calculating distribution. Only pods matching these labels are considered.

🛡️ whenUnsatisfiable

What to do when the constraint can't be satisfied. Either ScheduleAnyway (soft) or DoNotSchedule (strict/hard).

🎯

Plain English Think of maxSkew as the "allowed imbalance budget." If maxSkew is 1, no node can have more than 1 extra pod compared to the least loaded node. If it's 3, a difference of up to 3 is tolerated before Kubernetes starts routing around it.

Supported topologyKey Values

topologyKey	Spreads Across	Use When
kubernetes.io/hostname	Individual nodes	✓ Most Common
topology.kubernetes.io/zone	AZ / datacenter zones	Cloud HA
topology.kubernetes.io/region	Cloud regions	Multi-region
Custom label	Any node group you define	Advanced

📋

SECTION 03

Prerequisites

Kubernetes 1.19+ — topologySpreadConstraints became GA (stable) in v1.19
Multiple nodes — spreading has no effect on a single-node cluster
Pod labels set correctly — the labelSelector must match your pod labels exactly
kubectl access — or Helm if using the chart-based approach

⚠️

Single-node clusters On a single-node cluster (like Minikube or kind with one node), topologySpreadConstraints is silently ignored or may cause pods to stay Pending if using DoNotSchedule. Always test on a multi-node setup.

📄

PHASE 01

Direct YAML Deployment

Add topologySpreadConstraints directly inside your Deployment manifest under spec → template → spec.

📍

Exact placement in the manifest The constraint lives inside spec.template.spec, at the same level as containers. It is not in the top-level spec of the Deployment — it's the pod spec inside the template.

Kubernetes YAML — Deployment

apiVersion: apps/v1
kind: Deployment
metadata:
  name: my-app
spec:
  replicas: 6
  selector:
    matchLabels:
      app: my-app
  template:
    metadata:
      labels:
        app: my-app
        spread-group: "app"    # ← required for labelSelector
    spec:
      # ──────────────────────────────────────────
      # ADD THIS BLOCK under spec (pod spec level)
      # ──────────────────────────────────────────
      topologySpreadConstraints:
        - maxSkew: 3
          topologyKey: "kubernetes.io/hostname"
          whenUnsatisfiable: "ScheduleAnyway"
          labelSelector:
            matchLabels:
              spread-group: "app"
      containers:
        - name: my-app
          image: my-app:latest

Multi-Constraint Example (Node + Zone)

You can stack multiple constraints to spread across both zones and individual nodes simultaneously:

YAML — Stacked Constraints

topologySpreadConstraints:
  # Constraint 1: spread across availability zones
  - maxSkew: 1
    topologyKey: "topology.kubernetes.io/zone"
    whenUnsatisfiable: "DoNotSchedule"
    labelSelector:
      matchLabels:
        spread-group: "app"
  # Constraint 2: also spread across individual nodes
  - maxSkew: 2
    topologyKey: "kubernetes.io/hostname"
    whenUnsatisfiable: "ScheduleAnyway"
    labelSelector:
      matchLabels:
        spread-group: "app"

⛵

PHASE 02

Helm Chart Setup

For Helm-managed deployments, parameterize the constraints via values.yaml so they can be tuned per environment without touching the template.

01

Update `templates/deployment.yaml`

Replace hardcoded values with Helm template variables referencing .Values.podSpread.

templates/deployment.yaml

spec:
  template:
    metadata:
      labels:
        spread-group: "{{ .Values.podSpread.label }}"
    spec:
      topologySpreadConstraints:
        - maxSkew: {{ .Values.podSpread.maxSkew }}
          topologyKey: {{ .Values.podSpread.topologyKey }}
          whenUnsatisfiable: {{ .Values.podSpread.whenUnsatisfiable }}
          labelSelector:
            matchLabels:
              spread-group: "{{ .Values.podSpread.label }}"

02

Define Defaults in `values.yaml`

These are the default values. Override them per environment using -f values-prod.yaml or --set flags.

values.yaml

podSpread:
  maxSkew: 3
  topologyKey: "kubernetes.io/hostname"
  whenUnsatisfiable: "ScheduleAnyway"
  label: "app"

03

Environment-Specific Overrides

Use a values-prod.yaml to enforce stricter spreading in production without changing the base chart.

values-prod.yaml

# Production: stricter HA requirements
podSpread:
  maxSkew: 1
  topologyKey: "kubernetes.io/hostname"
  whenUnsatisfiable: "DoNotSchedule"
  label: "app"

Helm Deploy Command

# Deploy with production values:
helm upgrade --install my-app ./chart \
  -f values.yaml \
  -f values-prod.yaml

🏷️

PHASE 03

Required: Add Labels to Your Pods

This is mandatory and the most commonly missed step. The labelSelector in your constraint only works if your pods actually have the matching label. Without it, Kubernetes has no idea which pods to count when calculating spread.

Pod Metadata — Required Label

# This label MUST exist on every pod you want to spread
metadata:
  labels:
    app: my-app
    spread-group: "app"    # ← this is what labelSelector matches

⚠️

Label mismatch = silent failure If the label on the pod doesn't exactly match what's in labelSelector.matchLabels, the constraint is effectively ignored. Kubernetes won't error — it will just not spread. Always double-check the label key and value are identical.

💡

Why a separate spread-group label? Using a dedicated label like spread-group: app instead of just app: my-app gives you flexibility. If you later want to spread multiple different deployments as a single group (or exclude some pods), you can do so by controlling who gets this label.

🧮

DEEP DIVE 01

Understanding maxSkew

maxSkew is the core numeric parameter that controls how uneven the distribution can be. It represents the maximum allowed difference in pod count between any two topology domains (nodes).

Example with 3 Nodes and maxSkew: 3

✓ Allowed Distribution

Node 1

▣

4

pods

Node 2

▣

2

pods

Node 3

▣

1

pods

Max − Min = 4 − 1 = 3 ✔ within maxSkew: 3

Example with 3 Nodes — Exceeds maxSkew: 3

✗ Not Allowed (Strict Mode)

Node 1

▣

6

pods — overloaded

Node 2

▣

1

pods

Node 3

▣

1

pods

Max − Min = 6 − 1 = 5 ✗ exceeds maxSkew: 3 → blocked

maxSkew Quick Reference

maxSkew Value	Behaviour	Best For
1	Near-perfect balance. Strict.	Production HA
2	Moderate imbalance allowed.	Staging
3	Flexible. Default recommendation.	Most Deployments
5+	Very relaxed. Near no-op on small clusters.	Dev / Low Priority

🛡️

DEEP DIVE 02

whenUnsatisfiable — Soft vs Strict

This field controls what Kubernetes does when it cannot satisfy the spread constraint — either because there aren't enough nodes, or because all valid nodes would violate maxSkew.

ScheduleAnyway (soft)

Scheduler tries its best to balance
If impossible, pod is scheduled anyway
No pods will stay Pending
Imbalance may occur under pressure
Ideal for non-critical workloads

DoNotSchedule (strict)

Constraint is a hard requirement
Pod stays Pending if constraint breaks
Guarantees balanced distribution
Risk: pods stuck if cluster is unbalanced
Required for strict HA production apps

⚠️

DoNotSchedule can cause Pending pods If you use DoNotSchedule but don't have enough nodes (or nodes are full/tainted), your pods will stay in Pending state indefinitely. Always ensure you have enough eligible nodes when using strict mode.

YAML — Strict Production Config

topologySpreadConstraints:
  - maxSkew: 1                         # strict balance
    topologyKey: "kubernetes.io/hostname"
    whenUnsatisfiable: "DoNotSchedule"   # hard constraint
    labelSelector:
      matchLabels:
        spread-group: "app"

🌍

DEEP DIVE 03

Cross-Namespace Behavior

A common point of confusion: what happens when the same label exists in multiple namespaces?

ℹ️

Default behavior: namespace-scoped By default, topologySpreadConstraints only counts pods within the same namespace as the pod being scheduled. Pods in other namespaces are invisible to the calculation, even if they share the same labels.

Namespace A

Pods with spread-group: app are spread across nodes within Namespace A only.

Namespace B

Pods with the same label spread independently within Namespace B only. Unaware of Namespace A.

Cross-NS

Kubernetes does not balance pods across namespaces. This is not supported natively.

🔬

Kubernetes 1.24+ — namespaceSelector (Alpha) From v1.24, there is an alpha feature: namespaceSelector inside labelSelector that allows cross-namespace awareness. It requires enabling the TopologySpreadConstraintsNodeSpecificNamespaceSelector feature gate. Not recommended for production yet.

Practical Implication

# If you deploy app with same labels in 2 namespaces:
Namespace A: pods on Node1(3), Node2(1), Node3(2) → balanced ✓
Namespace B: pods on Node1(4), Node2(0), Node3(0) → balanced ✓ (within B)

# Node1 might end up with 3+4 = 7 pods total
# Neither namespace's constraint prevents this
# ❌ topologySpreadConstraints does NOT account for this

⚙️

DEEP DIVE 04

How the Scheduler Works Internally

When a new pod needs to be scheduled, the Kubernetes scheduler runs through this logic for each topologySpreadConstraint:

1

Find Matching Pods

The scheduler searches for all existing pods that match the labelSelector in the same namespace. These are the pods it will count for distribution.

2

Group by topologyKey

Pods are grouped by the value of the node label specified in topologyKey. Each unique value (hostname, zone, etc.) forms one "topology domain."

3

Calculate Skew Per Domain

For each candidate node, the scheduler calculates what the skew would be if the new pod were placed there: current count + 1 − min count.

Skew Formula

# After placing pod on candidate node:
skew = (count_on_candidate + 1) - min_count_across_all_nodes

# If skew > maxSkew → node is ineligible

4

Place on the Least Loaded Eligible Node

Among all nodes that don't violate maxSkew, the pod is placed on the one with the fewest matching pods. This actively drives distribution toward balance.

⚡

SECTION 07

Pro Tips & Combinations

Use maxSkew: 1 for strict HA setups where every pod must land on a different node
Use DoNotSchedule for production-critical apps — never let the scheduler ignore the constraint
Combine with podAntiAffinity for even stronger node separation (belt + suspenders approach)
Ensure you have at least as many eligible nodes as replicas when using DoNotSchedule
Test your spread with kubectl get pods -o wide to verify node distribution after deploy
For multi-AZ clusters, stack two constraints: one for hostname, one for zone

Combining with podAntiAffinity

For the strongest possible node separation, combine topologySpreadConstraints with a podAntiAffinity rule. The spread constraint handles even distribution; anti-affinity provides a hard barrier.

YAML — Belt + Suspenders HA Config

spec:
  topologySpreadConstraints:
    - maxSkew: 1
      topologyKey: "kubernetes.io/hostname"
      whenUnsatisfiable: "DoNotSchedule"
      labelSelector:
        matchLabels:
          spread-group: "app"

  affinity:
    podAntiAffinity:
      preferredDuringSchedulingIgnoredDuringExecution:
        - weight: 100
          podAffinityTerm:
            labelSelector:
              matchLabels:
                app: my-app
            topologyKey: "kubernetes.io/hostname"

Quick Verification Command

kubectl — Check Pod Distribution

# See which node each pod landed on:
kubectl get pods -l app=my-app -o wide

# Count pods per node:
kubectl get pods -l spread-group=app \
  -o jsonpath='{range .items[*]}{.spec.nodeName}{"\n"}{end}' \
  | sort | uniq -c | sort -rn

✅

SECTION 08

Final Outcome

With topologySpreadConstraints properly configured, your cluster scheduling is now deterministic and resilient.

🚀 What You've Gained

✅ Pods distributed across nodes

✅ Better high availability

✅ Reduced single-node risk

✅ More predictable scaling

✅ Configurable per-environment

⚙️ What Was Configured

📄 topologySpreadConstraints YAML

⛵ Helm chart parameterization

🏷️ Pod labels for labelSelector

🛡️ whenUnsatisfiable policy set

🧮 maxSkew tuned per env

Key Takeaways

Label your pods — spread-group label is mandatory for the constraint to function
Start with ScheduleAnyway — then tighten to DoNotSchedule once cluster is stable
maxSkew: 1 for production, maxSkew: 3 for flexible dev/staging environments
Namespace-scoped by default — cross-namespace spreading is not supported natively
Stack constraints — combine hostname + zone constraints for multi-AZ resilience
Verify with kubectl — always confirm actual distribution after deploying

🔜

Next Step: Zone-Aware Spreading Once you've mastered node-level spreading, add a second constraint with topologyKey: topology.kubernetes.io/zone to spread across availability zones. This is the foundation of truly resilient multi-AZ Kubernetes deployments.

Prevent Pods from Schedulingon the Same Node — topologySpreadConstraints

The Problem → The Solution

The Problem — Why This Matters

⚖️ Poor Resource Distribution

❌ Lower Fault Tolerance

💥 Node Failure = Full Outage

Core Concepts

🔑 topologyKey

📐 maxSkew

🏷️ labelSelector

🛡️ whenUnsatisfiable

Supported topologyKey Values

Prerequisites

Direct YAML Deployment

Multi-Constraint Example (Node + Zone)

Helm Chart Setup

Update templates/deployment.yaml

Define Defaults in values.yaml

Environment-Specific Overrides

Required: Add Labels to Your Pods

Understanding maxSkew

Example with 3 Nodes and maxSkew: 3

Example with 3 Nodes — Exceeds maxSkew: 3

maxSkew Quick Reference

whenUnsatisfiable — Soft vs Strict

Cross-Namespace Behavior

Namespace A

Namespace B

Cross-NS

How the Scheduler Works Internally

Find Matching Pods

Group by topologyKey

Calculate Skew Per Domain

Place on the Least Loaded Eligible Node

Pro Tips & Combinations

Combining with podAntiAffinity

Quick Verification Command

Final Outcome

🚀 What You've Gained

⚙️ What Was Configured

Key Takeaways

Prevent Pods from Scheduling
on the Same Node — topologySpreadConstraints

Update `templates/deployment.yaml`

Define Defaults in `values.yaml`