☸️ Kubernetes Scheduling Series

Prevent Pods from Scheduling
on the Same NodetopologySpreadConstraints

Multiple replicas piling onto one node kills high availability. This guide shows you exactly how to spread pods evenly using topologySpreadConstraints — in both raw YAML and Helm.

topologySpreadConstraints maxSkew Pod Scheduling Helm Chart High Availability labelSelector

The Problem → The Solution

By default, Kubernetes doesn't guarantee your pods land on different nodes. All replicas can end up on Node 1, meaning a single node failure wipes out your entire app. topologySpreadConstraints fixes this by enforcing even distribution across nodes or availability zones.

😱
Default
All pods → same node = outage risk
⚙️
topologySpread
Constraints applied at scheduler level
Distributed
Pods balanced across nodes/zones
⚠️
SECTION 01

The Problem — Why This Matters

When you scale a Deployment to multiple replicas, Kubernetes decides where each pod lands. Without any constraints, the scheduler may place all replicas on the same node — repeatedly.

⚖️ Poor Resource Distribution

One node gets overloaded while others sit idle. CPU and memory are wasted across the cluster.

❌ Lower Fault Tolerance

If all pods are on one node, a node failure means zero replicas survive. Your app goes down entirely.

💥 Node Failure = Full Outage

The whole point of multiple replicas is HA. Co-locating them completely defeats that purpose.

🔴
Real scenario You have a 3-replica deployment. Kubernetes schedules all 3 on Node1. Node1 goes down for maintenance. Your service is now fully unavailable, even though Node2 and Node3 are perfectly healthy.
💡
SECTION 02

Core Concepts

Understanding these four building blocks makes the YAML configuration obvious rather than mysterious.

🔑 topologyKey

The node label used to define "groups." Use kubernetes.io/hostname to spread per node, or topology.kubernetes.io/zone to spread across availability zones.

📐 maxSkew

Maximum allowed difference in pod count between the most-loaded and least-loaded group. maxSkew: 1 enforces near-perfect balance.

🏷️ labelSelector

Tells Kubernetes which pods to count when calculating distribution. Only pods matching these labels are considered.

🛡️ whenUnsatisfiable

What to do when the constraint can't be satisfied. Either ScheduleAnyway (soft) or DoNotSchedule (strict/hard).

🎯
Plain English Think of maxSkew as the "allowed imbalance budget." If maxSkew is 1, no node can have more than 1 extra pod compared to the least loaded node. If it's 3, a difference of up to 3 is tolerated before Kubernetes starts routing around it.

Supported topologyKey Values

topologyKeySpreads AcrossUse When
kubernetes.io/hostname Individual nodes ✓ Most Common
topology.kubernetes.io/zone AZ / datacenter zones Cloud HA
topology.kubernetes.io/region Cloud regions Multi-region
Custom label Any node group you define Advanced
📋
SECTION 03

Prerequisites

  • Kubernetes 1.19+topologySpreadConstraints became GA (stable) in v1.19
  • Multiple nodes — spreading has no effect on a single-node cluster
  • Pod labels set correctly — the labelSelector must match your pod labels exactly
  • kubectl access — or Helm if using the chart-based approach
⚠️
Single-node clusters On a single-node cluster (like Minikube or kind with one node), topologySpreadConstraints is silently ignored or may cause pods to stay Pending if using DoNotSchedule. Always test on a multi-node setup.

📄
PHASE 01

Direct YAML Deployment

Add topologySpreadConstraints directly inside your Deployment manifest under spec → template → spec.

📍
Exact placement in the manifest The constraint lives inside spec.template.spec, at the same level as containers. It is not in the top-level spec of the Deployment — it's the pod spec inside the template.
Kubernetes YAML — Deployment
apiVersion: apps/v1
kind: Deployment
metadata:
  name: my-app
spec:
  replicas: 6
  selector:
    matchLabels:
      app: my-app
  template:
    metadata:
      labels:
        app: my-app
        spread-group: "app"    # ← required for labelSelector
    spec:
      # ──────────────────────────────────────────
      # ADD THIS BLOCK under spec (pod spec level)
      # ──────────────────────────────────────────
      topologySpreadConstraints:
        - maxSkew: 3
          topologyKey: "kubernetes.io/hostname"
          whenUnsatisfiable: "ScheduleAnyway"
          labelSelector:
            matchLabels:
              spread-group: "app"
      containers:
        - name: my-app
          image: my-app:latest

Multi-Constraint Example (Node + Zone)

You can stack multiple constraints to spread across both zones and individual nodes simultaneously:

YAML — Stacked Constraints
topologySpreadConstraints:
  # Constraint 1: spread across availability zones
  - maxSkew: 1
    topologyKey: "topology.kubernetes.io/zone"
    whenUnsatisfiable: "DoNotSchedule"
    labelSelector:
      matchLabels:
        spread-group: "app"
  # Constraint 2: also spread across individual nodes
  - maxSkew: 2
    topologyKey: "kubernetes.io/hostname"
    whenUnsatisfiable: "ScheduleAnyway"
    labelSelector:
      matchLabels:
        spread-group: "app"

PHASE 02

Helm Chart Setup

For Helm-managed deployments, parameterize the constraints via values.yaml so they can be tuned per environment without touching the template.

01

Update templates/deployment.yaml

Replace hardcoded values with Helm template variables referencing .Values.podSpread.

templates/deployment.yaml
spec:
  template:
    metadata:
      labels:
        spread-group: "{{ .Values.podSpread.label }}"
    spec:
      topologySpreadConstraints:
        - maxSkew: {{ .Values.podSpread.maxSkew }}
          topologyKey: {{ .Values.podSpread.topologyKey }}
          whenUnsatisfiable: {{ .Values.podSpread.whenUnsatisfiable }}
          labelSelector:
            matchLabels:
              spread-group: "{{ .Values.podSpread.label }}"
02

Define Defaults in values.yaml

These are the default values. Override them per environment using -f values-prod.yaml or --set flags.

values.yaml
podSpread:
  maxSkew: 3
  topologyKey: "kubernetes.io/hostname"
  whenUnsatisfiable: "ScheduleAnyway"
  label: "app"
03

Environment-Specific Overrides

Use a values-prod.yaml to enforce stricter spreading in production without changing the base chart.

values-prod.yaml
# Production: stricter HA requirements
podSpread:
  maxSkew: 1
  topologyKey: "kubernetes.io/hostname"
  whenUnsatisfiable: "DoNotSchedule"
  label: "app"
Helm Deploy Command
# Deploy with production values:
helm upgrade --install my-app ./chart \
  -f values.yaml \
  -f values-prod.yaml

🏷️
PHASE 03

Required: Add Labels to Your Pods

This is mandatory and the most commonly missed step. The labelSelector in your constraint only works if your pods actually have the matching label. Without it, Kubernetes has no idea which pods to count when calculating spread.

Pod Metadata — Required Label
# This label MUST exist on every pod you want to spread
metadata:
  labels:
    app: my-app
    spread-group: "app"    # ← this is what labelSelector matches
⚠️
Label mismatch = silent failure If the label on the pod doesn't exactly match what's in labelSelector.matchLabels, the constraint is effectively ignored. Kubernetes won't error — it will just not spread. Always double-check the label key and value are identical.
💡
Why a separate spread-group label? Using a dedicated label like spread-group: app instead of just app: my-app gives you flexibility. If you later want to spread multiple different deployments as a single group (or exclude some pods), you can do so by controlling who gets this label.

🧮
DEEP DIVE 01

Understanding maxSkew

maxSkew is the core numeric parameter that controls how uneven the distribution can be. It represents the maximum allowed difference in pod count between any two topology domains (nodes).

Example with 3 Nodes and maxSkew: 3

✓ Allowed Distribution

Node 1
4
pods
Node 2
2
pods
Node 3
1
pods
Max − Min = 4 − 1 = 3 ✔ within maxSkew: 3

Example with 3 Nodes — Exceeds maxSkew: 3

✗ Not Allowed (Strict Mode)

Node 1
6
pods — overloaded
Node 2
1
pods
Node 3
1
pods
Max − Min = 6 − 1 = 5 ✗ exceeds maxSkew: 3 → blocked

maxSkew Quick Reference

maxSkew ValueBehaviourBest For
1 Near-perfect balance. Strict. Production HA
2 Moderate imbalance allowed. Staging
3 Flexible. Default recommendation. Most Deployments
5+ Very relaxed. Near no-op on small clusters. Dev / Low Priority
🛡️
DEEP DIVE 02

whenUnsatisfiable — Soft vs Strict

This field controls what Kubernetes does when it cannot satisfy the spread constraint — either because there aren't enough nodes, or because all valid nodes would violate maxSkew.

ScheduleAnyway (soft)
  • Scheduler tries its best to balance
  • If impossible, pod is scheduled anyway
  • No pods will stay Pending
  • Imbalance may occur under pressure
  • Ideal for non-critical workloads
DoNotSchedule (strict)
  • Constraint is a hard requirement
  • Pod stays Pending if constraint breaks
  • Guarantees balanced distribution
  • Risk: pods stuck if cluster is unbalanced
  • Required for strict HA production apps
⚠️
DoNotSchedule can cause Pending pods If you use DoNotSchedule but don't have enough nodes (or nodes are full/tainted), your pods will stay in Pending state indefinitely. Always ensure you have enough eligible nodes when using strict mode.
YAML — Strict Production Config
topologySpreadConstraints:
  - maxSkew: 1                         # strict balance
    topologyKey: "kubernetes.io/hostname"
    whenUnsatisfiable: "DoNotSchedule"   # hard constraint
    labelSelector:
      matchLabels:
        spread-group: "app"
🌍
DEEP DIVE 03

Cross-Namespace Behavior

A common point of confusion: what happens when the same label exists in multiple namespaces?

ℹ️
Default behavior: namespace-scoped By default, topologySpreadConstraints only counts pods within the same namespace as the pod being scheduled. Pods in other namespaces are invisible to the calculation, even if they share the same labels.

Namespace A

Pods with spread-group: app are spread across nodes within Namespace A only.

Namespace B

Pods with the same label spread independently within Namespace B only. Unaware of Namespace A.

Cross-NS

Kubernetes does not balance pods across namespaces. This is not supported natively.

🔬
Kubernetes 1.24+ — namespaceSelector (Alpha) From v1.24, there is an alpha feature: namespaceSelector inside labelSelector that allows cross-namespace awareness. It requires enabling the TopologySpreadConstraintsNodeSpecificNamespaceSelector feature gate. Not recommended for production yet.
Practical Implication
# If you deploy app with same labels in 2 namespaces:
Namespace A: pods on Node1(3), Node2(1), Node3(2) → balanced ✓
Namespace B: pods on Node1(4), Node2(0), Node3(0) → balanced ✓ (within B)

# Node1 might end up with 3+4 = 7 pods total
# Neither namespace's constraint prevents this
# ❌ topologySpreadConstraints does NOT account for this
⚙️
DEEP DIVE 04

How the Scheduler Works Internally

When a new pod needs to be scheduled, the Kubernetes scheduler runs through this logic for each topologySpreadConstraint:

1

Find Matching Pods

The scheduler searches for all existing pods that match the labelSelector in the same namespace. These are the pods it will count for distribution.

2

Group by topologyKey

Pods are grouped by the value of the node label specified in topologyKey. Each unique value (hostname, zone, etc.) forms one "topology domain."

3

Calculate Skew Per Domain

For each candidate node, the scheduler calculates what the skew would be if the new pod were placed there: current count + 1 − min count.

Skew Formula
# After placing pod on candidate node:
skew = (count_on_candidate + 1) - min_count_across_all_nodes

# If skew > maxSkew → node is ineligible
4

Place on the Least Loaded Eligible Node

Among all nodes that don't violate maxSkew, the pod is placed on the one with the fewest matching pods. This actively drives distribution toward balance.


SECTION 07

Pro Tips & Combinations

  • Use maxSkew: 1 for strict HA setups where every pod must land on a different node
  • Use DoNotSchedule for production-critical apps — never let the scheduler ignore the constraint
  • Combine with podAntiAffinity for even stronger node separation (belt + suspenders approach)
  • Ensure you have at least as many eligible nodes as replicas when using DoNotSchedule
  • Test your spread with kubectl get pods -o wide to verify node distribution after deploy
  • For multi-AZ clusters, stack two constraints: one for hostname, one for zone

Combining with podAntiAffinity

For the strongest possible node separation, combine topologySpreadConstraints with a podAntiAffinity rule. The spread constraint handles even distribution; anti-affinity provides a hard barrier.

YAML — Belt + Suspenders HA Config
spec:
  topologySpreadConstraints:
    - maxSkew: 1
      topologyKey: "kubernetes.io/hostname"
      whenUnsatisfiable: "DoNotSchedule"
      labelSelector:
        matchLabels:
          spread-group: "app"

  affinity:
    podAntiAffinity:
      preferredDuringSchedulingIgnoredDuringExecution:
        - weight: 100
          podAffinityTerm:
            labelSelector:
              matchLabels:
                app: my-app
            topologyKey: "kubernetes.io/hostname"

Quick Verification Command

kubectl — Check Pod Distribution
# See which node each pod landed on:
kubectl get pods -l app=my-app -o wide

# Count pods per node:
kubectl get pods -l spread-group=app \
  -o jsonpath='{range .items[*]}{.spec.nodeName}{"\n"}{end}' \
  | sort | uniq -c | sort -rn
SECTION 08

Final Outcome

With topologySpreadConstraints properly configured, your cluster scheduling is now deterministic and resilient.

🚀 What You've Gained

Pods distributed across nodes
Better high availability
Reduced single-node risk
More predictable scaling
Configurable per-environment

⚙️ What Was Configured

📄 topologySpreadConstraints YAML
Helm chart parameterization
🏷️ Pod labels for labelSelector
🛡️ whenUnsatisfiable policy set
🧮 maxSkew tuned per env

Key Takeaways

  • Label your podsspread-group label is mandatory for the constraint to function
  • Start with ScheduleAnyway — then tighten to DoNotSchedule once cluster is stable
  • maxSkew: 1 for production, maxSkew: 3 for flexible dev/staging environments
  • Namespace-scoped by default — cross-namespace spreading is not supported natively
  • Stack constraints — combine hostname + zone constraints for multi-AZ resilience
  • Verify with kubectl — always confirm actual distribution after deploying
🔜
Next Step: Zone-Aware Spreading Once you've mastered node-level spreading, add a second constraint with topologyKey: topology.kubernetes.io/zone to spread across availability zones. This is the foundation of truly resilient multi-AZ Kubernetes deployments.