ML
Kubernetes

The 502s on Every Deploy: SIGTERM, preStop, and Graceful Shutdown

Every rollout sprayed a few hundred 502s for about ten seconds, then went quiet. The app was healthy, the new pods were fine, and nobody could agree on what was broken. The bug was in the gap between Kubernetes deleting a pod and the load balancer noticing.

June 23, 20269 min readKubernetesReliability

For months we had a deploy ritual that everyone quietly accepted: ship the new version, watch the error dashboard spike to a few hundred 502s for about ten seconds, watch it settle, then go to lunch. Nobody owned it because nobody could reproduce it on demand outside of a real rollout. The new pods were healthy. The old pods exited cleanly. And yet every single deploy leaked a burst of errors straight to users. The problem wasn't the app. It was the handful of seconds between Kubernetes deciding to kill a pod and the rest of the system agreeing that the pod was gone.

Pod termination is not one event, it's a race

When you delete a pod (which a rolling update does for you), two things happen at the same time, not in sequence. The pod is marked Terminating, and Kubernetes fires off two independent processes in parallel: the kubelet sends your container SIGTERM, and the endpoints controller removes the pod from the Service's endpoint list. People assume the endpoint removal happens first. It does not. It is asynchronous, and it has to propagate to kube-proxy on every node, to your ingress controller, and to any external load balancer. That propagation takes time, often a second or two.

So here is the actual sequence that bit us: the pod gets SIGTERM, the app dutifully shuts down its HTTP server immediately, and the connection refused starts. Meanwhile the load balancer still has that pod in its rotation for another second or two and keeps sending it fresh requests. Those requests hit a socket that is already closed. That is your 502. The app did everything right and that was exactly the problem: it shut down too fast, before traffic stopped being routed to it.

The fix is to shut down slower, on purpose

The counterintuitive fix is to make the pod wait after it receives SIGTERM, so that endpoint removal has time to propagate before the server actually stops accepting connections. The cleanest lever is a preStop hook with a sleep. Kubernetes runs preStop before sending SIGTERM, and it holds off the SIGTERM until preStop finishes.

lifecycle:
  preStop:
    exec:
      command: ["sh", "-c", "sleep 10"]
terminationGracePeriodSeconds: 30

That ten-second sleep is dead time during which the pod is still fully able to serve requests, but is already being pulled out of every load balancer's rotation. By the time SIGTERM finally arrives, no new traffic is being routed to the pod, so a clean server shutdown leaks nothing. The grace period has to be comfortably larger than the sleep plus however long in-flight requests need to drain, or the kubelet escalates to SIGKILL and you are back to dropping connections.

Then actually drain in-flight requests

The sleep handles new traffic. You still have to handle the requests that were already in flight when SIGTERM arrived. The pattern is: catch SIGTERM, stop accepting new connections, and let the existing ones finish before exiting.

const server = app.listen(8080);

process.on("SIGTERM", () => {
  // stop accepting new connections; finish the in-flight ones
  server.close(() => process.exit(0));
  // safety net: don't hang forever if a request is stuck
  setTimeout(() => process.exit(1), 15_000).unref();
});

The order matters. The preStop sleep keeps the server alive and routable while the LB drains it. SIGTERM then triggers a graceful server.close() that refuses new sockets but lets active handlers complete. The timeout is there because there is always one request stuck on a slow query, and you do not want it to hold the whole pod hostage past the grace period.

Why readiness probes don't save you here

The instinct is "just fail the readiness probe on shutdown and Kubernetes will route around it." It helps, but it does not fully fix the race, because flipping readiness to false also triggers an asynchronous endpoint update with the same propagation delay. You are relying on the exact mechanism that is slow. The preStop sleep is more robust precisely because it does not depend on anything propagating in time: it just holds the pod open long enough that whatever propagation needs to happen, has. Use both if you like, but the sleep is the part that does the work.

Rules of thumb

  • If every deploy leaks a short burst of 5xx and then recovers, suspect termination races, not the new version. Healthy pods plus deploy-only errors is the signature.
  • SIGTERM and endpoint removal happen in parallel, not in order. The pod can get SIGTERM while the load balancer is still sending it traffic.
  • Add a preStop sleep (5 to 15s) so endpoint removal propagates before your server stops. Shutting down instantly is the bug, not the fix.
  • Set terminationGracePeriodSeconds larger than the sleep plus your worst-case request drain, or the kubelet SIGKILLs you mid-request.
  • On SIGTERM, stop accepting new connections and drain in-flight ones, with a hard timeout so a stuck request can't block the whole shutdown.
  • Don't rely on the readiness probe alone to stop traffic on shutdown; it uses the same slow endpoint propagation you're trying to outrun.
SharePostLinkedIn

Reader Discussion

6 replies// weighed in

TopNewestAuthor
Add to the thread
Disagree, agree harder, or share your own experience…
Email instead →markdown okbe kind
  1. Highlighted by author
    Anders Lindqvist· Staff SREStory

    the preStop sleep trick. THE preStop sleep trick. we spent 3 days debugging mystery 5xx during deploys and the answer was a 10-second sleep. there should be a billboard.

    Jun 24, 2026·1 day later
  2. Tiến Hồ🇻🇳 Hà Nội· DevOps EngineerAgrees

    PDB = thứ 90% team mình bỏ qua đến khi GKE node upgrade làm 3 pod down một lúc. minAvailable: 2 cộng với replicas: 3 là default mình deploy bây giờ, không cần suy nghĩ.

    Jun 26, 2026·3 days later
  3. Priscilla Owens· Backend LeadPushback

    small pushback — "never set CPU limits" is too strong imo. on cgroups v2 with steady traffic profiles, soft caps prevent one noisy neighbour from starving the whole node. it's a per-workload call. great post otherwise.

    Jun 30, 2026·1 week later·edited
  4. Jiwoo Park· Junior EngineerKind words

    the "control loop, not deploy tool" framing finally made k8s click for me. been fighting with it for 4 months. wish onboarding docs led with this paragraph instead of YAML.

    Jun 25, 2026·2 days later
  5. Vasili Kurov· Platform EngineerFrom experience

    HPA on CPU only is the silent killer. moved ours to requests-per-pod via prometheus-adapter and our scaling went from "vaguely correct" to actually correlated with load. 2 hours of work, immediate ROI.

    Jun 27, 2026·4 days later
  6. Isabella Costa· Junior EngineerKind words

    saved this. sharing at standup tomorrow — we've had exactly this problem for 2 sprints and nobody on the team had framed it this way 🙏

    Jun 25, 2026·2 days later

Worked on something similar? Email ducminhldm@gmail.com — I read every one. The good ones become future posts.

Comments seeded · live discussion via email