ML
System Design

TIME_WAIT and the TCP Handshake: Why Your Server Runs Out of Ports

Connection exhaustion under load almost never means you ran out of memory — it means you ran out of ephemeral ports to TIME_WAIT sockets. Here is the TCP lifecycle that explains it and the fixes that actually work.

June 16, 20269 min readSystem Design

A service that's healthy at moderate traffic starts throwing "cannot assign requested address" or "connection refused" under load, with plenty of CPU and memory to spare. The cause is almost always the same, and it isn't a resource you'd think to watch: you ran out of ephemeral ports because thousands of sockets are sitting in TIME_WAIT. Understanding why means understanding how a TCP connection is born and how it dies.

A connection is identified by four numbers

TCP doesn't identify a connection by a port — it's identified by the 4-tuple: source IP, source port, destination IP, destination port. Two connections can share three of the four; what must be unique is the whole tuple. For an outbound client connection, the destination IP and port are fixed (the server you're calling), and your source IP is fixed, so the only thing that varies is your source port — drawn from the ephemeral port range, typically about 28,000 ports (32768–60999 on Linux). That number is the real ceiling on concurrent-plus-recently-closed connections to a single destination.

The handshake: three packets to start

Opening a connection is the famous three-way handshake:

client            server
  | --- SYN -------> |   "let's talk, my seq = x"
  | <-- SYN-ACK ---- |   "ok, my seq = y, ack x+1"
  | --- ACK -------> |   "ack y+1"  -> ESTABLISHED

One round trip before any data flows — which is exactly why connection reuse and keep-alive matter so much for latency. But the interesting part for port exhaustion is how connections close.

The four-way close and where TIME_WAIT comes from

Tearing down is four packets, because each side closes its half independently:

client                server
  | --- FIN -------> |
  | <-- ACK -------- |
  | <-- FIN -------- |
  | --- ACK -------> |   the FIN-sending side now
  |                  |   enters TIME_WAIT

The side that closes first (sends the first FIN, usually the client) ends up in TIME_WAIT after sending that final ACK — and stays there for 2×MSL (maximum segment lifetime), which on Linux is a fixed 60 seconds. The socket is closed to your application but its 4-tuple is reserved, and crucially its source port cannot be reused for a new connection to the same destination during that window.

Why TIME_WAIT exists — don't just kill it

It looks like pure waste, so people reach for "disable TIME_WAIT". It's there for two real reasons. First, a delayed duplicate packet from this connection could arrive late and be mis-delivered into a new connection reusing the same tuple — TIME_WAIT lets stragglers die out. Second, if the peer never got the final ACK and retransmits its FIN, the lingering socket is there to ACK it again instead of replying with a RST. Disable it carelessly and you trade a capacity problem for rare, maddening data-corruption-shaped bugs.

Doing the math on exhaustion

Now the failure is obvious. Say one app server opens connections to a single backend (a database, an upstream API) and closes each after use. With ~28,000 ephemeral ports and a 60-second TIME_WAIT, your ceiling is roughly 28,000 / 60 ≈ 460 new connections per second to that one destination. Push past that and every port is tied up in TIME_WAIT, new connect() calls fail with "cannot assign requested address", and the service falls over while looking idle.

The fixes, best first

  • Reuse connections. The root fix isn't tuning TIME_WAIT — it's not creating a connection per request. Connection pools (HTTP keep-alive, a DB pool) keep a handful of long-lived connections busy, so you open thousands fewer sockets and TIME_WAIT never piles up. This alone solves the overwhelming majority of cases.
  • tcp_tw_reuse. Lets the kernel safely reuse a port still in TIME_WAIT for a new outbound connection, using TCP timestamps to distinguish old packets. Safe for the client/outbound side and the right knob when you genuinely make many short connections.
  • Widen the ephemeral range (ip_local_port_range) and spread across more destination IPs — more 4-tuples means more headroom. A band-aid, not a cure.
  • Do not blindly enable the old tcp_tw_recycle — it was removed from Linux for breaking clients behind NAT, and "just lower MSL" reintroduces the duplicate-packet risk TIME_WAIT exists to prevent.

Rules of thumb

  • Connections are keyed by the 4-tuple; for outbound traffic to one destination, the only variable is your source port, and that pool is finite (~28k).
  • TIME_WAIT lands on whichever side closes first and lasts 2×MSL (60s on Linux), holding that source port out of use the whole time.
  • "Cannot assign requested address" under load = ephemeral-port exhaustion from TIME_WAIT, not memory. Check ss -s for the TIME_WAIT count.
  • The fix is connection reuse (pools, keep-alive), not killing TIME_WAIT. Per-request connections are the actual bug.
  • If you must tune the kernel, prefer tcp_tw_reuse for outbound; never resurrect tcp_tw_recycle.
  • Move the FIN-first burden onto the side that can afford TIME_WAIT — often you want the server to keep connections alive so clients, not your shared backend, absorb the cost.
SharePostLinkedIn

Reader Discussion

2 replies// weighed in

TopNewestAuthor
Add to the thread
Disagree, agree harder, or share your own experience…
Email instead →markdown okbe kind
  1. Isabella Costa· Junior EngineerKind words

    saved this. sharing at standup tomorrow — we've had exactly this problem for 2 sprints and nobody on the team had framed it this way 🙏

    Jun 18, 2026·2 days later
  2. Kenta Yamada· Tech LeadAsks

    would love a war-story follow-up. principles are clear; the actual debugging session is where the interesting stuff lives. there's a real shortage of "here's the dashboard, here's the thread we pulled, here's where we got stuck for 90 mins" content.

    Jun 20, 2026·4 days later

Worked on something similar? Email ducminhldm@gmail.com — I read every one. The good ones become future posts.

Comments seeded · live discussion via email