Cache Stampede: How One Expired Key Took Down the Database
A single popular cache key expired at peak traffic and three thousand requests all missed at once, all ran the same expensive query, and the database fell over. The fix isn't a longer TTL. It's making sure exactly one request rebuilds the value.
We had a homepage feed cached in Redis with a 60-second TTL. Cheap to serve, expensive to build: a fan-out query that took about 900ms and hammered four tables. For months it was invisible. Then one evening at peak the database CPU spiked to 100%, p99 went from 30ms to nine seconds, and the on-call graph showed a perfectly periodic sawtooth, one spike per minute. The cache wasn't broken. It was doing exactly what we told it to. The problem was what happens at the instant a hot key expires.
The thundering herd
When that one popular key expires, every request that wanted it now misses at the same moment. At peak we were serving roughly 3000 req/s for that feed. The key's TTL lapses, and in the next few hundred milliseconds three thousand requests all check Redis, all get a miss, and all decide to rebuild the value themselves. Now three thousand copies of a 900ms query hit the database simultaneously. That is a cache stampede, also called the thundering herd or a dogpile.
The cruel part is that the cache was working. A longer TTL doesn't fix it; it just makes the stampede rarer and bigger. A shorter TTL makes it more frequent. The miss itself is the trigger, and the miss is unavoidable as long as keys expire.
// the naive read-through that stampedes
async function getFeed() {
const hit = await redis.get("feed:home");
if (hit) return JSON.parse(hit);
// 3000 requests arrive here at once on expiry
const data = await buildFeedFromDB(); // 900ms, 4 tables
await redis.set("feed:home", JSON.stringify(data), "EX", 60);
return data;
}
Fix 1: let exactly one request rebuild
The core idea is mutual exclusion on the miss. On a miss, a request tries to grab a short-lived lock. Whoever wins rebuilds and repopulates; everyone else either waits briefly and re-reads, or serves slightly stale data. Redis gives you the lock primitive directly with SET key value NX PX ttl, which sets only if the key is absent (NX) and auto-expires (PX) so a crashed holder can't deadlock the key forever.
async function getFeed() {
const hit = await redis.get("feed:home");
if (hit) return JSON.parse(hit);
// only one winner: NX = set-if-absent, PX = lock expiry
const got = await redis.set("feed:lock:home", "1", "NX", "PX", 5000);
if (got) {
try {
const data = await buildFeedFromDB();
await redis.set("feed:home", JSON.stringify(data), "EX", 60);
return data;
} finally {
await redis.del("feed:lock:home");
}
}
// lost the race: wait a beat and read the value the winner wrote
await sleep(50);
const second = await redis.get("feed:home");
if (second) return JSON.parse(second);
return buildFeedFromDB(); // last resort if winner is slow
}
This alone took the database from 3000 concurrent rebuilds down to one. But it has a weakness: the losers either stall waiting or fall through to a slow path, and the value is fully gone for the ~900ms the winner needs. You still get a latency cliff once a minute, just without the database fire.
Fix 2: early recompute so the value never actually expires
The better pattern keeps the value present and refreshes it before it dies. Store an explicit logical expiry alongside the data and let one request refresh probabilistically as that expiry approaches, while everyone keeps serving the still-valid cached value. This is the probabilistic early expiration from the "XFetch" paper, and it is shorter than it sounds:
// store value + when it was built + how long it took to build
const { value, builtAt, deltaMs } = await readCacheWithMeta("feed:home");
const ttlMs = 60_000;
const now = Date.now();
// as we near expiry, an increasing share of requests volunteer to refresh.
// beta>1 makes refreshes start earlier; deltaMs weights expensive rebuilds.
const shouldRefresh =
now - builtAt + deltaMs * BETA * Math.log(Math.random()) >= ttlMs;
if (shouldRefresh) {
// typically just ONE request crosses the threshold first
triggerBackgroundRebuild(); // refresh without blocking this response
}
return value; // everyone still serves the valid value
Because the trigger is randomized and weighted by how expensive the rebuild is, in practice a single request refreshes early and silently while every other request keeps getting an instant hit. No miss, no cliff, no herd. We layered this on top of the lock so the one volunteer that refreshes is also the only one allowed to touch the database. After shipping it the per-minute sawtooth on the database flattened completely.
The failure you forgot: the miss that lasts forever
One more trap. If buildFeedFromDB() throws or returns empty for a key that doesn't exist, and you only cache successes, then every request for that key misses forever and pounds the database with the same failing query. Cache the negative result too, with a short TTL, so a missing user or a transient failure costs you one query a minute instead of all of them. A stampede on an error is still a stampede.
Rules of thumb
- A periodic spike on your database that lines up with a cache TTL is a stampede, not a slow query. The fix is concurrency control on the miss, not a bigger TTL.
- Use
SET key val NX PX ttlso exactly one request rebuilds a hot key. ThePXexpiry is mandatory, or a crashed holder deadlocks the key. - Locking removes the database fire but leaves a latency cliff. Early probabilistic recompute removes the cliff too by refreshing before the value expires.
- Weight the early-refresh trigger by rebuild cost: expensive values should start refreshing sooner, cheap ones can wait.
- Cache negative results with a short TTL. An endlessly-missing key stampedes just as hard as a popular one.
- Never give a whole class of keys the same exact TTL; jitter it so they don't all expire on the same tick and herd together.