The Connection Pool Deadlock: When One Request Needs Two Connections
The endpoint was fine in every test and fine in staging. Then traffic crossed some invisible line in production and it stopped responding entirely, not slow, fully frozen, with no errors in the logs. The pool had ten connections, and under load every request was holding one while waiting for a second that would never come.
An order endpoint behaved perfectly until it didn't. Single requests were fast. Load tests at modest concurrency passed. Then one afternoon real traffic climbed past some threshold and the endpoint simply stopped answering. Not 500s, not slow responses, nothing: requests went out and never came back, the process sat at near-zero CPU, and the logs were silent. Restarting fixed it for a few minutes, then it froze again. The database was healthy and almost idle. The app was deadlocked on itself, and the culprit was the connection pool.
The shape of the bug
The handler opened a transaction, did some work, and called a helper to enrich the result. The helper looked completely innocent. It ran one small query. The problem was where it got the connection to run that query: it reached into the same pool, asking for a second connection while the handler was still holding the first.
const pool = new Pool({ max: 10 }); // ten connections, total
async function handler(req) {
const client = await pool.connect(); // holds 1 of 10 for the whole request
try {
await client.query("BEGIN");
const order = await createOrder(client, req);
await enrich(order); // looks harmless
await client.query("COMMIT");
} finally {
client.release();
}
}
async function enrich(order) {
// the trap: grabs ITS OWN connection from the same pool
const tax = await pool.query(
"SELECT rate FROM tax WHERE region = $1",
[order.region],
);
return applyTax(order, tax.rows[0]);
}
At low concurrency this is fine. There are spare connections, so when enrich asks for one it gets it immediately, uses it, returns it. The bug is invisible because the pool is never under pressure. It only shows up when the number of in-flight requests reaches the pool size.
Why it deadlocks, exactly
Say ten requests arrive at once. Each one calls pool.connect() and succeeds, because there are exactly ten connections. Now all ten are checked out, each held by a handler that is sitting at the await enrich(order) line. Then all ten handlers call enrich, which calls pool.query, which needs an eleventh connection. There is no eleventh connection. The pool has zero available, so every enrich call queues, waiting for one of the ten to be released.
But none of the ten will ever be released, because each one is owned by a handler that is blocked inside enrich, waiting for the connection that the handler itself is holding. Everyone is waiting for a resource that someone else is holding while waiting. That is a textbook deadlock, and it is total: the pool will never make progress on its own. Every new request that arrives just joins the queue. The endpoint is frozen, no error is thrown, because from each individual caller's point of view it is just waiting for a connection, which is a normal thing to do.
The fix: one request, one connection
The real fix is to stop treating "get a connection" as something any function can do whenever it likes. A connection belongs to the unit of work. Acquire it once at the top, and thread it through every function that needs to touch the database for that request.
async function handler(req) {
const client = await pool.connect();
try {
await client.query("BEGIN");
const order = await createOrder(client, req);
await enrich(client, order); // pass the SAME connection down
await client.query("COMMIT");
} finally {
client.release();
}
}
async function enrich(client, order) {
const tax = await client.query( // reuse, don't re-acquire
"SELECT rate FROM tax WHERE region = $1",
[order.region],
);
return applyTax(order, tax.rows[0]);
}
Now a request consumes exactly one connection no matter how deep the call stack goes, and ten concurrent requests use ten connections with none left waiting. It is also more correct in another way: the enrich query now runs inside the same transaction as the rest of the request, which is almost always what you actually wanted. The original code read tax rates outside the transaction, a subtle consistency bug hiding behind the deadlock.
Why "just make the pool bigger" is a trap
The tempting quick fix is to bump max from 10 to 50 and move on. It does not fix the bug, it just moves the cliff. The deadlock condition is "concurrent requests times connections-held-per-request exceeds pool size." If each request can hold two connections at peak nesting, you deadlock at half the pool size, whatever that size is. A bigger pool needs more concurrency to trigger it, so it survives your load test and then deadlocks in production at 3am during a traffic spike. Worse, a large pool can overwhelm the database with connections, trading an app deadlock for database-side contention. The size of the pool is not the bug; holding two connections at once is the bug.
How to catch it before production
This class of bug is easy to flush out once you know its signature. Set your pool size deliberately small in a load test, smaller than your test concurrency, and watch for the freeze. A connection that is checked out for the full duration of a request, across awaits that themselves acquire connections, is the thing to look for in review. Most pool libraries expose a metric for "clients waiting for a connection"; if that number is ever greater than zero for a sustained period, you have requests starving each other, and a re-entrant checkout is the first thing to suspect.
Rules of thumb
- A connection belongs to a request, not to a function. Acquire it once at the top and pass it down; never re-acquire from the pool deeper in the same call stack.
- The deadlock triggers when concurrent-requests times connections-held-per-request exceeds the pool size. Holding two at once halves your safe concurrency.
- The symptom is a full freeze with no errors and an idle database, not slowness. Requests are politely waiting for connections that will never free up.
- Growing the pool doesn't fix it, it just moves the cliff to higher load, where it's harder to reproduce and worse when it hits.
- Threading one connection through the request also fixes a hidden correctness bug: nested queries run in the same transaction instead of outside it.
- Load test with a pool smaller than your concurrency, and alarm on "clients waiting for a connection" being persistently above zero.