The 502s on Every Deploy: SIGTERM, preStop, and Graceful Shutdown
Every rollout sprayed a few hundred 502s for about ten seconds, then went quiet. The app was healthy, the new pods were fine, and nobody could agree on what was broken. The bug was in the gap between Kubernetes deleting a pod and the load balancer noticing.
For months we had a deploy ritual that everyone quietly accepted: ship the new version, watch the error dashboard spike to a few hundred 502s for about ten seconds, watch it settle, then go to lunch. Nobody owned it because nobody could reproduce it on demand outside of a real rollout. The new pods were healthy. The old pods exited cleanly. And yet every single deploy leaked a burst of errors straight to users. The problem wasn't the app. It was the handful of seconds between Kubernetes deciding to kill a pod and the rest of the system agreeing that the pod was gone.
Pod termination is not one event, it's a race
When you delete a pod (which a rolling update does for you), two things happen at the same time, not in sequence. The pod is marked Terminating, and Kubernetes fires off two independent processes in parallel: the kubelet sends your container SIGTERM, and the endpoints controller removes the pod from the Service's endpoint list. People assume the endpoint removal happens first. It does not. It is asynchronous, and it has to propagate to kube-proxy on every node, to your ingress controller, and to any external load balancer. That propagation takes time, often a second or two.
So here is the actual sequence that bit us: the pod gets SIGTERM, the app dutifully shuts down its HTTP server immediately, and the connection refused starts. Meanwhile the load balancer still has that pod in its rotation for another second or two and keeps sending it fresh requests. Those requests hit a socket that is already closed. That is your 502. The app did everything right and that was exactly the problem: it shut down too fast, before traffic stopped being routed to it.
The fix is to shut down slower, on purpose
The counterintuitive fix is to make the pod wait after it receives SIGTERM, so that endpoint removal has time to propagate before the server actually stops accepting connections. The cleanest lever is a preStop hook with a sleep. Kubernetes runs preStop before sending SIGTERM, and it holds off the SIGTERM until preStop finishes.
lifecycle:
preStop:
exec:
command: ["sh", "-c", "sleep 10"]
terminationGracePeriodSeconds: 30
That ten-second sleep is dead time during which the pod is still fully able to serve requests, but is already being pulled out of every load balancer's rotation. By the time SIGTERM finally arrives, no new traffic is being routed to the pod, so a clean server shutdown leaks nothing. The grace period has to be comfortably larger than the sleep plus however long in-flight requests need to drain, or the kubelet escalates to SIGKILL and you are back to dropping connections.
Then actually drain in-flight requests
The sleep handles new traffic. You still have to handle the requests that were already in flight when SIGTERM arrived. The pattern is: catch SIGTERM, stop accepting new connections, and let the existing ones finish before exiting.
const server = app.listen(8080);
process.on("SIGTERM", () => {
// stop accepting new connections; finish the in-flight ones
server.close(() => process.exit(0));
// safety net: don't hang forever if a request is stuck
setTimeout(() => process.exit(1), 15_000).unref();
});
The order matters. The preStop sleep keeps the server alive and routable while the LB drains it. SIGTERM then triggers a graceful server.close() that refuses new sockets but lets active handlers complete. The timeout is there because there is always one request stuck on a slow query, and you do not want it to hold the whole pod hostage past the grace period.
Why readiness probes don't save you here
The instinct is "just fail the readiness probe on shutdown and Kubernetes will route around it." It helps, but it does not fully fix the race, because flipping readiness to false also triggers an asynchronous endpoint update with the same propagation delay. You are relying on the exact mechanism that is slow. The preStop sleep is more robust precisely because it does not depend on anything propagating in time: it just holds the pod open long enough that whatever propagation needs to happen, has. Use both if you like, but the sleep is the part that does the work.
Rules of thumb
- If every deploy leaks a short burst of 5xx and then recovers, suspect termination races, not the new version. Healthy pods plus deploy-only errors is the signature.
- SIGTERM and endpoint removal happen in parallel, not in order. The pod can get SIGTERM while the load balancer is still sending it traffic.
- Add a
preStopsleep (5 to 15s) so endpoint removal propagates before your server stops. Shutting down instantly is the bug, not the fix. - Set
terminationGracePeriodSecondslarger than the sleep plus your worst-case request drain, or the kubelet SIGKILLs you mid-request. - On SIGTERM, stop accepting new connections and drain in-flight ones, with a hard timeout so a stuck request can't block the whole shutdown.
- Don't rely on the readiness probe alone to stop traffic on shutdown; it uses the same slow endpoint propagation you're trying to outrun.