Timeout Budgets: Why One Slow Dependency Blows Your Whole Request

The checkout endpoint had a 30-second timeout. So did the payment service it called, and the fraud check that called, and the database underneath that. When one of them got slow, the timeouts stacked instead of capping, and a single sluggish dependency held thousands of connections open until the whole chain fell over.

June 23, 20268 min readSystem DesignReliability

Our checkout request fanned out through four services before it returned: gateway to order service to payment service to a fraud-scoring call, with a database at the bottom. Each one had a sensible-looking 30-second timeout, set independently by whichever team owned it. Then the fraud service started taking 25 seconds under load. You would expect one slow call. What we got instead was the entire checkout path melting down: connection pools exhausted at every layer, requests piling up, and a 30-second-deep wall of latency that users experienced as the site simply hanging. The timeouts were all "correct" and together they were a disaster.

Independent timeouts don't compose, they stack

The mistake is treating a timeout as a local property of one call. When service A calls B calls C, and each has its own 30s timeout, the worst case isn't 30 seconds, it is that A waits the full 30s for B, which spent that time waiting on C. The timeouts don't protect each other; they all expire at roughly the same wall-clock moment, after the damage is done. Worse, while A sits there waiting, it is holding a connection, a thread or goroutine, and a slot in its own caller's pool. One slow leaf service applies backpressure all the way up the tree, and every layer runs out of capacity at once.

The number that actually matters to a user is the time budget for the whole request. If checkout should respond in 3 seconds or give up, then no individual hop is allowed to spend 30. The budget belongs to the request, not to any single call, and it has to shrink as it travels down the chain.

Propagate a deadline, not a duration

The fix is to compute a deadline at the edge (an absolute point in time, "respond by 12:00:03.000") and pass it down with every call. Each service, before making a downstream request, sets that call's timeout to the time remaining until the deadline, never more. gRPC does this for you with deadlines; over HTTP you pass a header and honor it.

// edge: turn the request budget into an absolute deadline
const deadline = Date.now() + 3000;   // 3s for the whole request

// before each downstream call: only spend what's left
function remaining() {
  return deadline - Date.now();
}

async function callPayment(req) {
  const budget = remaining();
  if (budget <= 50) throw new DeadlineExceeded();   // no point starting
  return fetch(paymentUrl, {
    signal: AbortSignal.timeout(budget),
    headers: { "x-deadline-ms": String(deadline) },
  });
}

Now the deadline is shared. If the fraud call has already eaten 2.8 of the 3 seconds, the payment service sees only 200ms left and fails fast instead of starting a fresh 30-second wait. The slow dependency still fails, but it fails in 200ms at the right layer, and it stops holding the entire chain hostage.

Fail fast on a dead budget

The cheapest win in the snippet above is the early check: if there isn't enough time left to plausibly succeed, don't even start the call. Sending a request you know cannot beat the deadline is pure waste; it loads the downstream, consumes a connection, and produces a result nobody is waiting for anymore. Checking remaining() before each hop turns "everyone waits 30s and then errors" into "we stop the moment success becomes impossible." That single guard is what breaks the cascade.

// downstream service reads the inherited deadline
const deadline = Number(req.headers["x-deadline-ms"]) || (Date.now() + DEFAULT);
const budget = deadline - Date.now();
if (budget <= 0) return res.status(504).end();   // already too late, don't query the DB

Leave headroom at each layer

One subtlety: don't hand a downstream the entire remaining budget. If you have 1000ms left and you give the payment call all 1000ms, you have no time to do anything with its response, return a partial result, or write a fallback. Reserve a slice at each layer: spend maybe 80% of your remaining budget on the downstream call and keep the rest for your own work and a graceful degradation path. The tail of a distributed request is where users feel pain, and a little reserved headroom is what lets you return a useful "we couldn't verify fraud, try again" instead of a raw timeout.

Rules of thumb

A timeout is a property of the whole request, not of one call. Independently set timeouts stack into the worst case; they don't bound it.
Compute an absolute deadline at the edge and propagate it downward. Each hop spends only the time remaining, never a fresh full timeout.
Before every downstream call, check the remaining budget and fail fast if it's gone. Don't start work that can't beat the deadline.
Reserve headroom: give a downstream ~80% of what's left so you keep time for your own response and a fallback.
Use gRPC deadlines or an explicit deadline header; "everyone sets 30s and hopes" is how one slow leaf takes down the tree.
A slow dependency should fail fast at the layer closest to it, not slowly at the top after holding connections open the whole way down.

Timeout Budgets: Why One Slow Dependency Blows Your Whole Request

Independent timeouts don't compose, they stack

Propagate a deadline, not a duration

Fail fast on a dead budget

Leave headroom at each layer

Rules of thumb

2 replies// weighed in

More from this topic

Idempotency Keys: Making APIs Safe to Retry

The Outbox Pattern: Atomic DB and Queue Writes

The Saga Pattern: Distributed Transactions Without Two-Phase Commit