ML
PostgreSQL

MVCC, Vacuum & Why Your Table Got 10× Bigger

Postgres never updates a row in place — it writes a new version and tombstones the old one. Understanding that is how you stop fearing VACUUM.

November 12, 20259 min readPostgreSQLInternals

Most Postgres outages I have been paged for trace back to one of three letters: MVCC. Multi-Version Concurrency Control is what lets a hundred sessions read a table while one of them rewrites it without anybody taking a lock. It is also the reason a 2 GiB table can mysteriously balloon to 20 GiB overnight.

1. The promise: readers don't block writers

Every row in Postgres carries two hidden columns: xmin (the transaction that created it) and xmax (the transaction that deleted or superseded it). A read query at transaction X sees only rows where xmin ≤ X < xmax. That snapshot rule is the entire concurrency model.

SELECT xmin, xmax, id, balance FROM accounts WHERE id = 42;
-- xmin=10421  xmax=0  id=42  balance=100

2. The cost: dead tuples

When you UPDATE, Postgres does not edit the row. It writes a new row version with the new xmin and sets xmax on the old version. The old version sticks around until no live transaction could still see it.

That cemetery of old versions is what we call dead tuples. They cost disk, they cost cache, and they cost index lookups because every index still points at them.

3. VACUUM, autovacuum, and the bloat spiral

VACUUM reclaims space inside the table file. Crucially: it does not shrink the file back to the OS — it just marks the pages reusable. Only VACUUM FULL rewrites the table, and it takes an ACCESS EXCLUSIVE lock.

Autovacuum runs in the background. Its thresholds default to autovacuum_vacuum_scale_factor=0.2 — Postgres waits until 20% of the table is dead before triggering. On a 100M-row table that is 20M dead tuples accumulating before anything happens.

ALTER TABLE orders SET (
  autovacuum_vacuum_scale_factor = 0.05,
  autovacuum_vacuum_cost_limit = 2000
);

4. Long-running transactions are the silent killer

Autovacuum cannot remove a row version while any transaction could still need it. One forgotten BEGIN in a psql session — or one stuck connection from an analytics tool — and bloat accumulates until you kill the session.

SELECT pid, state, xact_start, query
FROM pg_stat_activity
WHERE xact_start < now() - interval '5 minutes'
ORDER BY xact_start;

Put a job on a five-minute cadence that flags any transaction older than your usual longest legitimate one. It pays for itself the first time it catches a hung migration.

5. Index bloat is a different problem

Indexes accumulate their own dead entries. VACUUM cleans them, but a heavily-churned index can still drift wider than the data warrants. REINDEX CONCURRENTLY rebuilds without taking the table offline (Postgres 12+).

Rules of thumb

  • Tune autovacuum per hot table, not globally.
  • Monitor n_dead_tup from pg_stat_user_tables and alert when it crosses a percentage of live rows.
  • Treat any transaction older than 60 seconds as a paging event.
  • Reach for VACUUM FULL rarely, with a maintenance window. pg_repack is the online alternative.
SharePostLinkedIn

Reader Discussion

1 replies// weighed in

TopNewestAuthor
Add to the thread
Disagree, agree harder, or share your own experience…
Email instead →markdown okbe kind
  1. Isabella Costa· Junior EngineerKind words

    saved this. sharing at standup tomorrow — we've had exactly this problem for 2 sprints and nobody on the team had framed it this way 🙏

    Nov 14, 2025·2 days later

Worked on something similar? Email ducminhldm@gmail.com — I read every one. The good ones become future posts.

Comments seeded · live discussion via email