Liveness, Readiness, and Startup Probes: The Three Health Checks and How They Differ
Readiness gates traffic, liveness restarts containers, startup protects slow boots — and wiring them up the same way is how a slow dependency turns into a restart loop. The differences that matter.
Kubernetes gives a container three health checks, and the most common production incident with them is treating all three as "is the app up?". They answer three different questions, and pointing them at the same endpoint is how a slow database turns a healthy pod into a CrashLoopBackOff. Here is what each one actually controls.
Readiness: should traffic come to this pod?
A failing readiness probe does not restart the container. It removes the pod from the Service's Endpoints, so no new traffic is routed to it; existing connections are untouched. When it passes again, the pod is added back. This is the probe for transient "I'm temporarily busy" states — warming a cache, waiting on a dependency, draining before shutdown.
readinessProbe:
httpGet:
path: /readyz
port: 8080
periodSeconds: 5
failureThreshold: 3
Liveness: should this container be restarted?
A failing liveness probe kills the container and lets the kubelet restart it (subject to the backoff). It exists for one situation only: the process is running but wedged — a deadlock, an event loop stuck, a state a restart fixes. If a restart would not help, it should not be a liveness failure.
livenessProbe:
httpGet:
path: /livez
port: 8080
periodSeconds: 10
failureThreshold: 3
The mistake that causes restart storms
The anti-pattern is making /livez check downstream dependencies — the database, a cache, another service. Now picture the database getting slow. Every pod's liveness probe fails, so Kubernetes restarts all of them at once. Restarting an app never fixes a slow database, so they fail again, and you have converted a degraded dependency into a full outage plus a thundering herd of reconnects.
The rule: liveness checks only "is my own process wedged?" (usually a trivial in-process handler). Readiness checks "can I serve a real request right now?" including dependencies. A slow DB should make pods unready (stop sending them traffic) — never make them get killed.
Startup: protect slow boots from liveness
Apps with a long cold start (JVM warmup, large model load, migrations) hit a chicken-and-egg problem: a liveness probe aggressive enough for steady state will kill the container before it finishes booting. The naive fix is a big initialDelaySeconds on liveness, but that also delays detection of real hangs forever after.
The startup probe solves it cleanly: while it is running, liveness and readiness are disabled. Once it succeeds once, it never runs again and the other two take over. Size it for the worst-case boot:
startupProbe:
httpGet:
path: /livez
port: 8080
periodSeconds: 5
failureThreshold: 30 # allows up to 5 * 30 = 150s to boot
This gives a 150-second budget to start while keeping a tight 30-second liveness window once the app is healthy.
Rules of thumb
- Readiness = traffic gate (no restart). Liveness = restart trigger. Startup = a one-time grace period that gates the other two.
- Never check external dependencies in liveness — that is what turns a slow dependency into a cluster-wide restart loop. Put dependency checks in readiness.
- If a restart would not fix the failure, it must not be a liveness failure.
- Slow boot? Add a startup probe with a generous
failureThresholdinstead of inflating livenessinitialDelaySeconds. - Keep probe handlers cheap and dependency-light; an expensive probe under load becomes its own failure source.