ML
System Design

Head-of-Line Blocking: Why HTTP/2 Didn't Fix What You Think, and HTTP/3 Did

HTTP/2 multiplexes streams over one connection, so people assume one slow response can't block the others. It can, just one layer down, in TCP. Here is the head-of-line blocking that survived HTTP/2 and why HTTP/3 had to abandon TCP entirely to kill it.

June 17, 20269 min readSystem Design

The pitch for HTTP/2 was that it killed head-of-line blocking. One connection, many multiplexed streams, no more lining requests up single file. And it's true at the HTTP layer. But I once chased a tail-latency problem on an HTTP/2 service where a single dropped packet would stall every in-flight request at once, and no amount of HTTP/2 tuning touched it. The blocking hadn't gone away. It had moved down a layer, into TCP, where HTTP/2 couldn't see it. Understanding why is the whole reason HTTP/3 exists.

HTTP/1.1: blocking at the request layer

In HTTP/1.1 a connection handles one request at a time. Request two can't start until request one's response comes back. Browsers worked around it by opening six connections per host and pipelining was a mess nobody enabled. The blocking was obvious and lived right at the application layer: a slow response held up the next request on that connection.

HTTP/2: multiplexing, and the blocking you can't see

HTTP/2 puts many streams on one TCP connection. Each request/response is a stream, chopped into frames, and frames from different streams interleave on the wire. A slow response no longer blocks others at the HTTP layer: stream 5 being slow doesn't stop stream 7's frames from arriving. Real win, real latency improvement.

But all those streams ride one TCP connection, and TCP has a guarantee HTTP/2 can't opt out of: in-order, reliable delivery of a single byte stream. TCP doesn't know about your streams. It sees one ordered sequence of bytes. If one packet is lost, TCP will not hand any later bytes to the application until that lost packet is retransmitted and arrives, even if those later bytes belong to a completely different, perfectly healthy HTTP/2 stream.

One TCP connection, bytes in order:
[ stream7 ][ stream5 ][ stream7 ][ stream9 ] ...
                ^ this packet dropped
TCP holds stream7, stream9 bytes in its buffer,
delivers NOTHING to the app until the drop is retransmitted.

That's TCP head-of-line blocking, and it's what I was hitting. On a clean network you never notice. Add 1–2% packet loss (mobile, congested links, lossy wifi) and one drop freezes all your multiplexed streams at once. HTTP/2 made one slow response cheap, then quietly made one lost packet expensive for everyone on the connection.

Why HTTP/2 can't fix it: it's not in charge of transport

The frustrating part is there's no HTTP/2 setting for this. The blocking happens in the kernel's TCP stack, below HTTP/2, and TCP's whole contract is "one ordered reliable stream". You can't tell TCP "these bytes are independent, deliver them out of order", because then it wouldn't be TCP. To fix HOL blocking properly you have to change the transport itself. Which is exactly what HTTP/3 did.

HTTP/3: streams that TCP can't see, because there is no TCP

HTTP/3 runs over QUIC, a transport built on UDP. UDP gives you packets with no ordering guarantee, and QUIC builds reliability and ordering back on top, but per stream, not per connection. QUIC understands streams natively. A lost packet only blocks the stream whose data was in that packet. Every other stream keeps being delivered to the application.

QUIC over UDP, ordering is PER STREAM:
stream7: [..][dropped][..]   <- only stream7 waits
stream5: [..][..][..]        <- delivered, not blocked
stream9: [..][..]            <- delivered, not blocked

Same packet loss as before, but now only the one affected stream stalls. That's the actual elimination of head-of-line blocking, and it's why HTTP/3 had to abandon TCP: the problem was structurally unfixable on top of it.

The bonuses QUIC gets for free

  • Faster connection setup. TCP does its handshake, then TLS does another on top. QUIC folds the transport and crypto handshakes together, so a new connection is often 1-RTT, and 0-RTT for a resumed one. Pure latency win on connection-heavy workloads.
  • Connection migration. A TCP connection is bound to the 4-tuple, so switching from wifi to cellular kills it and you reconnect. QUIC identifies a connection by a connection ID, not the IP/port, so it survives a network change. Your phone walking out the door doesn't drop the connection.

So should you just turn on HTTP/3?

It's not free. QUIC lives in userspace, not the kernel, so it spends noticeably more CPU per byte than TCP, and that matters at high throughput. UDP is more likely to be throttled or blocked by middleboxes, so you keep HTTP/2 as a fallback rather than replacing it. And on a low-loss network (a clean datacenter LAN between your own services) TCP HOL blocking essentially never fires, so HTTP/3 buys you little there. The payoff is real on the lossy, high-latency, network-switching last mile, which is to say real users on real phones, not service-to-service calls in your VPC.

Rules of thumb

  • HTTP/2 removed head-of-line blocking at the HTTP layer, not the transport layer. One lost packet still stalls every multiplexed stream because TCP delivers one ordered byte stream.
  • TCP HOL blocking is invisible on clean networks and brutal under packet loss. If your tail latency spikes on lossy/mobile links but not in the datacenter, suspect it.
  • You can't fix it in HTTP/2 config; it's below HTTP, in TCP's ordering contract. That's why HTTP/3 changed the transport.
  • HTTP/3 over QUIC orders bytes per stream, so a drop only blocks its own stream. That's the genuine fix.
  • QUIC also gets faster handshakes (1-RTT/0-RTT) and connection migration across network changes for free.
  • HTTP/3 costs more CPU (userspace transport) and can be blocked by middleboxes, so keep HTTP/2 as fallback, and don't expect much benefit on a clean internal LAN.
SharePostLinkedIn

Reader Discussion

2 replies// weighed in

TopNewestAuthor
Add to the thread
Disagree, agree harder, or share your own experience…
Email instead →markdown okbe kind
  1. Léa Dubois· SREAsks

    any chance you'd publish these as a PDF collection? would love to print and read offline on flights. screen-fatigue is real.

    Jun 23, 2026·6 days later
  2. Ahmed Rahman· Full StackKind words

    concise + opinionated = my favourite kind of engineering post. so many blogs hedge every claim into mush. give me the spicy take with the receipts. more please.

    Jun 18, 2026·1 day later

Worked on something similar? Email ducminhldm@gmail.com — I read every one. The good ones become future posts.

Comments seeded · live discussion via email