ML
PostgreSQL

I Built a Job Queue on Postgres LISTEN/NOTIFY and It Quietly Dropped Jobs

LISTEN/NOTIFY looks like a free message bus hiding inside the database you already run. It is, until a worker reconnects, a transaction rolls back, or the notify queue overflows. Here is everything that bit me and the pattern that actually survives.

June 18, 20269 min readPostgres

We needed a job queue for maybe a few hundred jobs an hour. Spinning up Kafka or even Redis felt like a lot of operational weight for that, and we already had Postgres sitting right there. Then I remembered LISTEN/NOTIFY exists: a publish/subscribe channel built into Postgres, no extra infrastructure. Insert a job row, fire a NOTIFY, workers wake up instantly. I shipped it in an afternoon and felt clever for about a week, until someone noticed jobs were silently not running. Not failing loudly. Just never picked up. That bug taught me exactly what NOTIFY is and, more importantly, what it isn't.

What NOTIFY actually promises (it's less than you think)

The mental model that burned me was treating NOTIFY like a durable queue. It is not a queue. It's a signal, fire-and-forget, delivered only to sessions that are connected and listening at the moment the notification is sent. If no one is listening, the notification evaporates. There is no buffer, no replay, no "you have 1 unread message" when a worker reconnects. The payload it carries is capped at 8000 bytes, so you can't even stuff the job into it.

So my worker would drop its connection for a second during a deploy or a brief network blip, a job got inserted in that window, the NOTIFY went out to nobody, and that job sat in the table forever because the only thing that ever looked at the table was a worker waiting to be told to look. The signal was the only trigger, and the signal was lost.

NOTIFY is transactional, which is a feature until it's a footgun

Here's the part people get wrong in the other direction. NOTIFY respects transactions. If you do this:

BEGIN;
INSERT INTO jobs (...) VALUES (...);
NOTIFY jobs_channel;
-- ... more work ...
ROLLBACK;

the notification is never sent. Good: you don't want workers chasing a job that got rolled back. But the flip side trips people who fire NOTIFY and expect immediacy: the listener doesn't see it until the sending transaction commits. If you're inside a long transaction, your "instant" notification is queued behind your own commit. And duplicate identical notifications within one transaction get folded into one delivery, so you can't count them.

The overflow nobody warns you about

There's a server-wide queue (8GB by default) holding notifications until every listening session has consumed them. One slow or stuck listener that stops reading will hold the tail of that queue down, it fills, and then every transaction that calls NOTIFY starts failing. A single wedged consumer can take out writes across the whole database. The first time I saw NOTIFY queue is full in production I had no idea Postgres even had such a thing.

The fix: NOTIFY is the doorbell, not the mailbox

The pattern that actually works treats the table as the source of truth and NOTIFY purely as an optimisation to reduce latency. The doorbell can fail and you still get your mail, because you also check the mailbox on a timer.

-- the worker loop, in pseudocode
LISTEN jobs_channel;
loop:
    claim_and_run_jobs()          -- always do the real work here
    wait_for_notify(timeout = 5s) -- woken by NOTIFY, OR by the timeout
    -- either way, loop back and poll the table

If a NOTIFY arrives, great, you react in milliseconds. If it gets lost, the 5-second timeout makes you poll anyway and you pick up the orphaned job. Worst-case latency becomes the poll interval instead of "forever". NOTIFY buys you low latency on the happy path; the poll buys you correctness. You need both.

Claiming a job safely

Polling a shared table means two workers can grab the same row. The clean fix is the lock-skipping pattern I'll never write a queue without now:

UPDATE jobs
SET status = 'running', locked_at = now()
WHERE id = (
  SELECT id FROM jobs
  WHERE status = 'pending'
  ORDER BY created_at
  FOR UPDATE SKIP LOCKED   -- skip rows another worker already grabbed
  LIMIT 1
)
RETURNING *;

FOR UPDATE SKIP LOCKED means concurrent workers each lock a different row and never block on each other. No NOTIFY required for correctness at all, which is the whole point: the queue works even with the doorbell unplugged.

When this is the right call, and when it isn't

For low-to-moderate throughput where you already run Postgres and you value one fewer thing to operate, a poll-plus-NOTIFY queue is genuinely great and I'd reach for it again. But it's a database doing queue work. At high throughput the constant polling and the row churn (every claim is an UPDATE, hello dead tuples and vacuum) start costing real IO, and that's when a purpose-built broker earns its operational weight. The mistake isn't using Postgres as a queue. The mistake is using NOTIFY as the queue instead of as a latency hint on top of one.

Rules of thumb

  • NOTIFY is a fire-and-forget signal, not a durable queue. If no session is listening at send time, the notification is gone, no replay.
  • Never let NOTIFY be the only thing that triggers work. Make the table the source of truth and always poll on a timeout as a backstop.
  • Notifications fire only on COMMIT and not at all on ROLLBACK; a long transaction delays delivery until it commits.
  • One stuck listener can fill the server-wide notify queue and break NOTIFY for every writer. Monitor it and make consumers drain promptly.
  • Claim jobs with FOR UPDATE SKIP LOCKED so concurrent workers grab different rows without blocking or double-processing.
  • Great for modest throughput on infra you already run; once polling and update churn dominate your IO, move to a real broker.
SharePostLinkedIn

Reader Discussion

1 replies// weighed in

TopNewestAuthor
Add to the thread
Disagree, agree harder, or share your own experience…
Email instead →markdown okbe kind
  1. Isabella Costa· Junior EngineerKind words

    saved this. sharing at standup tomorrow — we've had exactly this problem for 2 sprints and nobody on the team had framed it this way 🙏

    Jun 20, 2026·2 days later

Worked on something similar? Email ducminhldm@gmail.com — I read every one. The good ones become future posts.

Comments seeded · live discussion via email