TIME_WAIT and the TCP Handshake: Why Your Server Runs Out of Ports
Connection exhaustion under load almost never means you ran out of memory — it means you ran out of ephemeral ports to TIME_WAIT sockets. Here is the TCP lifecycle that explains it and the fixes that actually work.
A service that's healthy at moderate traffic starts throwing "cannot assign requested address" or "connection refused" under load, with plenty of CPU and memory to spare. The cause is almost always the same, and it isn't a resource you'd think to watch: you ran out of ephemeral ports because thousands of sockets are sitting in TIME_WAIT. Understanding why means understanding how a TCP connection is born and how it dies.
A connection is identified by four numbers
TCP doesn't identify a connection by a port — it's identified by the 4-tuple: source IP, source port, destination IP, destination port. Two connections can share three of the four; what must be unique is the whole tuple. For an outbound client connection, the destination IP and port are fixed (the server you're calling), and your source IP is fixed, so the only thing that varies is your source port — drawn from the ephemeral port range, typically about 28,000 ports (32768–60999 on Linux). That number is the real ceiling on concurrent-plus-recently-closed connections to a single destination.
The handshake: three packets to start
Opening a connection is the famous three-way handshake:
client server
| --- SYN -------> | "let's talk, my seq = x"
| <-- SYN-ACK ---- | "ok, my seq = y, ack x+1"
| --- ACK -------> | "ack y+1" -> ESTABLISHED
One round trip before any data flows — which is exactly why connection reuse and keep-alive matter so much for latency. But the interesting part for port exhaustion is how connections close.
The four-way close and where TIME_WAIT comes from
Tearing down is four packets, because each side closes its half independently:
client server
| --- FIN -------> |
| <-- ACK -------- |
| <-- FIN -------- |
| --- ACK -------> | the FIN-sending side now
| | enters TIME_WAIT
The side that closes first (sends the first FIN, usually the client) ends up in TIME_WAIT after sending that final ACK — and stays there for 2×MSL (maximum segment lifetime), which on Linux is a fixed 60 seconds. The socket is closed to your application but its 4-tuple is reserved, and crucially its source port cannot be reused for a new connection to the same destination during that window.
Why TIME_WAIT exists — don't just kill it
It looks like pure waste, so people reach for "disable TIME_WAIT". It's there for two real reasons. First, a delayed duplicate packet from this connection could arrive late and be mis-delivered into a new connection reusing the same tuple — TIME_WAIT lets stragglers die out. Second, if the peer never got the final ACK and retransmits its FIN, the lingering socket is there to ACK it again instead of replying with a RST. Disable it carelessly and you trade a capacity problem for rare, maddening data-corruption-shaped bugs.
Doing the math on exhaustion
Now the failure is obvious. Say one app server opens connections to a single backend (a database, an upstream API) and closes each after use. With ~28,000 ephemeral ports and a 60-second TIME_WAIT, your ceiling is roughly 28,000 / 60 ≈ 460 new connections per second to that one destination. Push past that and every port is tied up in TIME_WAIT, new connect() calls fail with "cannot assign requested address", and the service falls over while looking idle.
The fixes, best first
- Reuse connections. The root fix isn't tuning TIME_WAIT — it's not creating a connection per request. Connection pools (HTTP keep-alive, a DB pool) keep a handful of long-lived connections busy, so you open thousands fewer sockets and TIME_WAIT never piles up. This alone solves the overwhelming majority of cases.
tcp_tw_reuse. Lets the kernel safely reuse a port still in TIME_WAIT for a new outbound connection, using TCP timestamps to distinguish old packets. Safe for the client/outbound side and the right knob when you genuinely make many short connections.- Widen the ephemeral range (
ip_local_port_range) and spread across more destination IPs — more 4-tuples means more headroom. A band-aid, not a cure. - Do not blindly enable the old
tcp_tw_recycle— it was removed from Linux for breaking clients behind NAT, and "just lower MSL" reintroduces the duplicate-packet risk TIME_WAIT exists to prevent.
Rules of thumb
- Connections are keyed by the 4-tuple; for outbound traffic to one destination, the only variable is your source port, and that pool is finite (~28k).
TIME_WAITlands on whichever side closes first and lasts 2×MSL (60s on Linux), holding that source port out of use the whole time.- "Cannot assign requested address" under load = ephemeral-port exhaustion from TIME_WAIT, not memory. Check
ss -sfor the TIME_WAIT count. - The fix is connection reuse (pools, keep-alive), not killing TIME_WAIT. Per-request connections are the actual bug.
- If you must tune the kernel, prefer
tcp_tw_reusefor outbound; never resurrecttcp_tw_recycle. - Move the
FIN-first burden onto the side that can afford TIME_WAIT — often you want the server to keep connections alive so clients, not your shared backend, absorb the cost.