Deadlocks: The Four Conditions and How to Break Them
Two transactions, two rows, locked in opposite order — and your whole worker pool grinds to a halt. Here's the theory and the fixes.
A deadlock is a cycle: T1 waits for a resource held by T2, T2 waits for a resource held by T1. Nobody moves. The database's deadlock detector eventually picks a victim and aborts it — that "deadlock detected" error on 3 AM on-call.
The Coffman conditions
All four must hold for a deadlock to occur. Break any one and you're safe:
- Mutual exclusion. Resources can't be shared.
- Hold and wait. A holder waits for more resources without releasing what it has.
- No preemption. Resources can't be forcibly taken away.
- Circular wait. A cycle exists in the wait-for graph.
The canonical database deadlock
-- T1:
BEGIN;
UPDATE accounts SET balance = balance - 50 WHERE id = 1; -- lock row 1
UPDATE accounts SET balance = balance + 50 WHERE id = 2; -- wait for row 2
-- T2 (concurrently):
BEGIN;
UPDATE accounts SET balance = balance - 50 WHERE id = 2; -- lock row 2
UPDATE accounts SET balance = balance + 50 WHERE id = 1; -- wait for row 1 → cycle
The cure: consistent ordering
Always acquire locks in the same order (e.g. by primary key, ascending). The cycle becomes impossible because no transaction ever waits "backwards."
-- Always smaller id first:
const [a, b] = [id1, id2].sort();
UPDATE accounts SET balance = balance - 50 WHERE id = a;
UPDATE accounts SET balance = balance + 50 WHERE id = b;
This fix cures 90% of database deadlocks I've seen in the wild.
Other fixes, by Coffman condition
- Break hold-and-wait: acquire all locks up front with
SELECT … FOR UPDATEin one statement. No partial holds. - Introduce preemption: use
LOCK_TIMEOUT/SET lock_timeout = '2s'. Caller retries on timeout. - Shrink the critical section: hold locks for milliseconds, not seconds. Long transactions amplify every cycle.
Application-level deadlocks
Same theory applies outside the database. Thread A holds mutex X, waits for Y. Thread B holds Y, waits for X. Lock ordering is the fix; a lock hierarchy documented in the code is the discipline.
Diagnosing in production
In Postgres: pg_stat_activity plus pg_locks. In MySQL: SHOW ENGINE INNODB STATUS has a "LATEST DETECTED DEADLOCK" section that prints both transactions and their locks. Every deadlock leaves a trail — don't just retry; read the trail.