ML
Concurrency

The Connection Pool Deadlock: When One Request Needs Two Connections

The endpoint was fine in every test and fine in staging. Then traffic crossed some invisible line in production and it stopped responding entirely, not slow, fully frozen, with no errors in the logs. The pool had ten connections, and under load every request was holding one while waiting for a second that would never come.

June 24, 20268 min readConcurrencyDatabases

An order endpoint behaved perfectly until it didn't. Single requests were fast. Load tests at modest concurrency passed. Then one afternoon real traffic climbed past some threshold and the endpoint simply stopped answering. Not 500s, not slow responses, nothing: requests went out and never came back, the process sat at near-zero CPU, and the logs were silent. Restarting fixed it for a few minutes, then it froze again. The database was healthy and almost idle. The app was deadlocked on itself, and the culprit was the connection pool.

The shape of the bug

The handler opened a transaction, did some work, and called a helper to enrich the result. The helper looked completely innocent. It ran one small query. The problem was where it got the connection to run that query: it reached into the same pool, asking for a second connection while the handler was still holding the first.

const pool = new Pool({ max: 10 });   // ten connections, total

async function handler(req) {
  const client = await pool.connect();      // holds 1 of 10 for the whole request
  try {
    await client.query("BEGIN");
    const order = await createOrder(client, req);
    await enrich(order);                     // looks harmless
    await client.query("COMMIT");
  } finally {
    client.release();
  }
}

async function enrich(order) {
  // the trap: grabs ITS OWN connection from the same pool
  const tax = await pool.query(
    "SELECT rate FROM tax WHERE region = $1",
    [order.region],
  );
  return applyTax(order, tax.rows[0]);
}

At low concurrency this is fine. There are spare connections, so when enrich asks for one it gets it immediately, uses it, returns it. The bug is invisible because the pool is never under pressure. It only shows up when the number of in-flight requests reaches the pool size.

Why it deadlocks, exactly

Say ten requests arrive at once. Each one calls pool.connect() and succeeds, because there are exactly ten connections. Now all ten are checked out, each held by a handler that is sitting at the await enrich(order) line. Then all ten handlers call enrich, which calls pool.query, which needs an eleventh connection. There is no eleventh connection. The pool has zero available, so every enrich call queues, waiting for one of the ten to be released.

But none of the ten will ever be released, because each one is owned by a handler that is blocked inside enrich, waiting for the connection that the handler itself is holding. Everyone is waiting for a resource that someone else is holding while waiting. That is a textbook deadlock, and it is total: the pool will never make progress on its own. Every new request that arrives just joins the queue. The endpoint is frozen, no error is thrown, because from each individual caller's point of view it is just waiting for a connection, which is a normal thing to do.

The fix: one request, one connection

The real fix is to stop treating "get a connection" as something any function can do whenever it likes. A connection belongs to the unit of work. Acquire it once at the top, and thread it through every function that needs to touch the database for that request.

async function handler(req) {
  const client = await pool.connect();
  try {
    await client.query("BEGIN");
    const order = await createOrder(client, req);
    await enrich(client, order);          // pass the SAME connection down
    await client.query("COMMIT");
  } finally {
    client.release();
  }
}

async function enrich(client, order) {
  const tax = await client.query(          // reuse, don't re-acquire
    "SELECT rate FROM tax WHERE region = $1",
    [order.region],
  );
  return applyTax(order, tax.rows[0]);
}

Now a request consumes exactly one connection no matter how deep the call stack goes, and ten concurrent requests use ten connections with none left waiting. It is also more correct in another way: the enrich query now runs inside the same transaction as the rest of the request, which is almost always what you actually wanted. The original code read tax rates outside the transaction, a subtle consistency bug hiding behind the deadlock.

Why "just make the pool bigger" is a trap

The tempting quick fix is to bump max from 10 to 50 and move on. It does not fix the bug, it just moves the cliff. The deadlock condition is "concurrent requests times connections-held-per-request exceeds pool size." If each request can hold two connections at peak nesting, you deadlock at half the pool size, whatever that size is. A bigger pool needs more concurrency to trigger it, so it survives your load test and then deadlocks in production at 3am during a traffic spike. Worse, a large pool can overwhelm the database with connections, trading an app deadlock for database-side contention. The size of the pool is not the bug; holding two connections at once is the bug.

How to catch it before production

This class of bug is easy to flush out once you know its signature. Set your pool size deliberately small in a load test, smaller than your test concurrency, and watch for the freeze. A connection that is checked out for the full duration of a request, across awaits that themselves acquire connections, is the thing to look for in review. Most pool libraries expose a metric for "clients waiting for a connection"; if that number is ever greater than zero for a sustained period, you have requests starving each other, and a re-entrant checkout is the first thing to suspect.

Rules of thumb

  • A connection belongs to a request, not to a function. Acquire it once at the top and pass it down; never re-acquire from the pool deeper in the same call stack.
  • The deadlock triggers when concurrent-requests times connections-held-per-request exceeds the pool size. Holding two at once halves your safe concurrency.
  • The symptom is a full freeze with no errors and an idle database, not slowness. Requests are politely waiting for connections that will never free up.
  • Growing the pool doesn't fix it, it just moves the cliff to higher load, where it's harder to reproduce and worse when it hits.
  • Threading one connection through the request also fixes a hidden correctness bug: nested queries run in the same transaction instead of outside it.
  • Load test with a pool smaller than your concurrency, and alarm on "clients waiting for a connection" being persistently above zero.
SharePostLinkedIn

Reader Discussion

6 replies// weighed in

TopNewestAuthor
Add to the thread
Disagree, agree harder, or share your own experience…
Email instead →markdown okbe kind
  1. Sofia Marquez· Backend LeadAgrees

    immutability as default is the single most under-rated concurrency advice. every nightmare I've debugged in 8 years comes back to someone mutating shared state "just this once." make wrong things hard to express.

    Jun 28, 2026·4 days later
  2. Hiếu Nguyễn· Full StackPushback

    tiny precision nit — volatile in Java provides visibility AND atomicity for single 32-bit reads/writes (long/double on legacy 32-bit JVMs is the exception). worth being precise because juniors read "visibility primitive" and reach for AtomicInteger when volatile is enough.

    Jul 01, 2026·1 week later·edited
  3. Maya Iyer· PlatformFrom experience

    Go's race detector is criminally under-used. Caught a bug in our scheduler we'd been running past for 6 months — turned out our "thread-safe" map was thread-safe in the way a chair is bulletproof. -race in CI, no exceptions.

    Jun 29, 2026·5 days later
  4. Tomáš Havel· Senior EngineerAgrees

    go channels solve a problem you don't have until you have it, and then they're the only thing that solves it. people reaching for sync.Mutex everywhere are usually one refactor away from a clean channel topology.

    Jun 30, 2026·6 days later
  5. Irene Chen· Staff EngineerAgrees

    "push the check into the write" is now the framing I use teaching juniors. once you see check-then-act anti-patterns you can't un-see them — they're hiding in literally every internal tool we have.

    Jun 26, 2026·2 days later
  6. Léa Dubois· SREAsks

    any chance you'd publish these as a PDF collection? would love to print and read offline on flights. screen-fatigue is real.

    Jun 30, 2026·6 days later

Worked on something similar? Email ducminhldm@gmail.com — I read every one. The good ones become future posts.

Comments seeded · live discussion via email