Liveness, Readiness, and Startup Probes: The Three Health Checks and How They Differ

Readiness gates traffic, liveness restarts containers, startup protects slow boots — and wiring them up the same way is how a slow dependency turns into a restart loop. The differences that matter.

June 14, 20269 min readKubernetes

Kubernetes gives a container three health checks, and the most common production incident with them is treating all three as "is the app up?". They answer three different questions, and pointing them at the same endpoint is how a slow database turns a healthy pod into a CrashLoopBackOff. Here is what each one actually controls.

Readiness: should traffic come to this pod?

A failing readiness probe does not restart the container. It removes the pod from the Service's Endpoints, so no new traffic is routed to it; existing connections are untouched. When it passes again, the pod is added back. This is the probe for transient "I'm temporarily busy" states — warming a cache, waiting on a dependency, draining before shutdown.

readinessProbe:
  httpGet:
    path: /readyz
    port: 8080
  periodSeconds: 5
  failureThreshold: 3

Liveness: should this container be restarted?

A failing liveness probe kills the container and lets the kubelet restart it (subject to the backoff). It exists for one situation only: the process is running but wedged — a deadlock, an event loop stuck, a state a restart fixes. If a restart would not help, it should not be a liveness failure.

livenessProbe:
  httpGet:
    path: /livez
    port: 8080
  periodSeconds: 10
  failureThreshold: 3

The mistake that causes restart storms

The anti-pattern is making /livez check downstream dependencies — the database, a cache, another service. Now picture the database getting slow. Every pod's liveness probe fails, so Kubernetes restarts all of them at once. Restarting an app never fixes a slow database, so they fail again, and you have converted a degraded dependency into a full outage plus a thundering herd of reconnects.

The rule: liveness checks only "is my own process wedged?" (usually a trivial in-process handler). Readiness checks "can I serve a real request right now?" including dependencies. A slow DB should make pods unready (stop sending them traffic) — never make them get killed.

Startup: protect slow boots from liveness

Apps with a long cold start (JVM warmup, large model load, migrations) hit a chicken-and-egg problem: a liveness probe aggressive enough for steady state will kill the container before it finishes booting. The naive fix is a big initialDelaySeconds on liveness, but that also delays detection of real hangs forever after.

The startup probe solves it cleanly: while it is running, liveness and readiness are disabled. Once it succeeds once, it never runs again and the other two take over. Size it for the worst-case boot:

startupProbe:
  httpGet:
    path: /livez
    port: 8080
  periodSeconds: 5
  failureThreshold: 30      # allows up to 5 * 30 = 150s to boot

This gives a 150-second budget to start while keeping a tight 30-second liveness window once the app is healthy.

Rules of thumb

Readiness = traffic gate (no restart). Liveness = restart trigger. Startup = a one-time grace period that gates the other two.
Never check external dependencies in liveness — that is what turns a slow dependency into a cluster-wide restart loop. Put dependency checks in readiness.
If a restart would not fix the failure, it must not be a liveness failure.
Slow boot? Add a startup probe with a generous failureThreshold instead of inflating liveness initialDelaySeconds.
Keep probe handlers cheap and dependency-light; an expensive probe under load becomes its own failure source.

Liveness, Readiness, and Startup Probes: The Three Health Checks and How They Differ

Readiness: should traffic come to this pod?

Liveness: should this container be restarted?

The mistake that causes restart storms

Startup: protect slow boots from liveness

Rules of thumb

7 replies// weighed in

More from this topic

Kubernetes For People Who Already Hate Kubernetes

The Production Kubernetes Checklist: 11 Things That Will Bite You

Requests and Limits: The Two Numbers That Decide Your Bill and Your Latency