SELECT FOR UPDATE SKIP LOCKED: Turning Postgres Into a Reliable Work Queue
Before you reach for Kafka or SQS, Postgres can be a correct, exactly-once-ish job queue — if you know SKIP LOCKED. Here is why the naive queue table deadlocks under load and the one clause that fixes it.
Every team eventually builds a "jobs" table and a worker loop that pulls the next pending row. Done naively, it works fine for one worker and falls apart the moment you scale to several: workers either grab the same job and do it twice, or they pile up blocked on each other's locks. Postgres has had the exact tool for this since 9.5, and most people never learn it. The fix is two words: SKIP LOCKED.
The naive queue and why it breaks
The obvious approach: mark a row as taken, then process it.
-- worker grabs "a" pending job
UPDATE jobs SET status = 'running'
WHERE id = (SELECT id FROM jobs WHERE status = 'pending' LIMIT 1)
RETURNING *;
With two workers running this concurrently, the inner SELECT in both transactions can read the same id before either UPDATE commits. One blocks on the other's row lock, and when it unblocks the row is already running — but your WHERE already chose that id, so depending on how you wrote it you either double-process or waste the round trip. Add a few dozen workers and most of them spend their time blocked on the single hottest pending row. The queue serialises itself.
FOR UPDATE locks the row you read
The first half of the fix is SELECT ... FOR UPDATE: it takes a row-level write lock on every row it returns, held until the transaction ends. Now a second worker that selects the same row blocks until the first commits, instead of racing it. Correct — but it has just made the contention worse: workers queue up single-file behind the locked row rather than fanning out across the table.
SKIP LOCKED: don't wait, move on
The second half is the magic. SKIP LOCKED tells Postgres: any row that is already locked by another transaction, pretend it isn't there and give me the next one. No blocking, no waiting.
BEGIN;
SELECT id, payload FROM jobs
WHERE status = 'pending'
ORDER BY created_at
FOR UPDATE SKIP LOCKED
LIMIT 1;
-- ... do the work for the returned id ...
UPDATE jobs SET status = 'done' WHERE id = $1;
COMMIT;
Ten workers run this at once and each gets a different row: worker 1 locks the oldest pending job, worker 2 skips that locked row and takes the next, and so on. There is no hot-row contention because no two workers ever fight over the same row — they slide past each other. This is precisely how Postgres-backed queues (and many "use your database as a queue" libraries) achieve high-throughput, lock-free fan-out.
The crash-safety property you get for free
Because the row lock is tied to the transaction, a worker that crashes mid-job has its transaction rolled back by Postgres, the lock released, and the row reverts to pending — another worker picks it up. You never lose a job to a dead worker, and you never need a separate "reclaim stuck jobs" reaper for the crash case. Contrast that with the UPDATE status='running' approach, where a crashed worker leaves a row stranded in running forever unless you build a timeout sweeper.
The trade-off: holding the row lock for the whole job means a long-running job holds its transaction open the entire time, which pins resources and blocks VACUUM from cleaning up rows newer than that transaction. For jobs measured in seconds this is fine; for jobs measured in minutes, prefer the claim-then-release pattern: a short transaction that flips the row to running with a locked_until timestamp using SKIP LOCKED, commit, do the work outside the transaction, then a second transaction marks it done — and a reaper resets rows whose locked_until has passed.
It's not just for queues
SKIP LOCKED fits any "hand out scarce items to concurrent consumers without collisions" problem: claiming the next free seat, leasing one IP from a pool, distributing batches to workers. Any time you'd otherwise reach for a distributed lock to avoid two consumers grabbing the same item, check whether a single FOR UPDATE SKIP LOCKED over a table does it — usually it does, transactionally, with no extra infrastructure.
Rules of thumb
- Plain
SELECT ... LIMIT 1+UPDATElets concurrent workers grab the same job — it races. Don't ship it as a multi-worker queue. FOR UPDATEmakes it correct but serial (workers block on the hot row).FOR UPDATE SKIP LOCKEDmakes it correct and parallel (workers skip locked rows and fan out).- Holding the lock for the whole transaction gives free crash recovery — a dead worker's lock releases and the job reverts to pending automatically.
- For long jobs, claim-and-release with a
locked_untilcolumn plus a reaper, so you don't pin a transaction (and blockVACUUM) for minutes. - Always pair it with
ORDER BYfor FIFO and an index on(status, created_at), or Postgres scans and locks rows you'll then skip. - Reach for it before reaching for a broker: for modest throughput, a Postgres table with
SKIP LOCKEDis a correct queue with zero new moving parts.