ML
Concurrency

Deadlocks: The Four Conditions and How to Break Them

Two transactions, two rows, locked in opposite order — and your whole worker pool grinds to a halt. Here's the theory and the fixes.

May 20, 20258 min readConcurrencyDebugging

A deadlock is a cycle: T1 waits for a resource held by T2, T2 waits for a resource held by T1. Nobody moves. The database's deadlock detector eventually picks a victim and aborts it — that "deadlock detected" error on 3 AM on-call.

The Coffman conditions

All four must hold for a deadlock to occur. Break any one and you're safe:

  1. Mutual exclusion. Resources can't be shared.
  2. Hold and wait. A holder waits for more resources without releasing what it has.
  3. No preemption. Resources can't be forcibly taken away.
  4. Circular wait. A cycle exists in the wait-for graph.

The canonical database deadlock

-- T1:
BEGIN;
UPDATE accounts SET balance = balance - 50 WHERE id = 1;  -- lock row 1
UPDATE accounts SET balance = balance + 50 WHERE id = 2;  -- wait for row 2

-- T2 (concurrently):
BEGIN;
UPDATE accounts SET balance = balance - 50 WHERE id = 2;  -- lock row 2
UPDATE accounts SET balance = balance + 50 WHERE id = 1;  -- wait for row 1 → cycle

The cure: consistent ordering

Always acquire locks in the same order (e.g. by primary key, ascending). The cycle becomes impossible because no transaction ever waits "backwards."

-- Always smaller id first:
const [a, b] = [id1, id2].sort();
UPDATE accounts SET balance = balance - 50 WHERE id = a;
UPDATE accounts SET balance = balance + 50 WHERE id = b;

This fix cures 90% of database deadlocks I've seen in the wild.

Other fixes, by Coffman condition

  • Break hold-and-wait: acquire all locks up front with SELECT … FOR UPDATE in one statement. No partial holds.
  • Introduce preemption: use LOCK_TIMEOUT / SET lock_timeout = '2s'. Caller retries on timeout.
  • Shrink the critical section: hold locks for milliseconds, not seconds. Long transactions amplify every cycle.

Application-level deadlocks

Same theory applies outside the database. Thread A holds mutex X, waits for Y. Thread B holds Y, waits for X. Lock ordering is the fix; a lock hierarchy documented in the code is the discipline.

Diagnosing in production

In Postgres: pg_stat_activity plus pg_locks. In MySQL: SHOW ENGINE INNODB STATUS has a "LATEST DETECTED DEADLOCK" section that prints both transactions and their locks. Every deadlock leaves a trail — don't just retry; read the trail.

SharePostLinkedIn

Reader Discussion

6 replies// weighed in

TopNewestAuthor
Add to the thread
Disagree, agree harder, or share your own experience…
Email instead →markdown okbe kind
  1. Sofia Marquez· Backend LeadAgrees

    immutability as default is the single most under-rated concurrency advice. every nightmare I've debugged in 8 years comes back to someone mutating shared state "just this once." make wrong things hard to express.

    May 24, 2025·4 days later
  2. Hiếu Nguyễn· Full StackPushback

    tiny precision nit — volatile in Java provides visibility AND atomicity for single 32-bit reads/writes (long/double on legacy 32-bit JVMs is the exception). worth being precise because juniors read "visibility primitive" and reach for AtomicInteger when volatile is enough.

    May 27, 2025·1 week later·edited
  3. Maya Iyer· PlatformFrom experience

    Go's race detector is criminally under-used. Caught a bug in our scheduler we'd been running past for 6 months — turned out our "thread-safe" map was thread-safe in the way a chair is bulletproof. -race in CI, no exceptions.

    May 25, 2025·5 days later
  4. Tomáš Havel· Senior EngineerAgrees

    go channels solve a problem you don't have until you have it, and then they're the only thing that solves it. people reaching for sync.Mutex everywhere are usually one refactor away from a clean channel topology.

    May 26, 2025·6 days later
  5. Irene Chen· Staff EngineerAgrees

    "push the check into the write" is now the framing I use teaching juniors. once you see check-then-act anti-patterns you can't un-see them — they're hiding in literally every internal tool we have.

    May 22, 2025·2 days later
  6. Léa Dubois· SREAsks

    any chance you'd publish these as a PDF collection? would love to print and read offline on flights. screen-fatigue is real.

    May 26, 2025·6 days later

Worked on something similar? Email ducminhldm@gmail.com — I read every one. The good ones become future posts.

Comments seeded · live discussion via email