ML
Node.js

Finding and Fixing Memory Leaks in Node.js

A memory leak in Node.js is almost always a reference you forgot to drop, not a bug in the garbage collector.

May 11, 202611 min readmemoryv8

A memory leak in Node.js is almost never the garbage collector failing to do its job. It is your code holding a live reference to something that should be dead. The GC is precise; it frees anything unreachable. So the entire skill is reasoning about reachability: what is still pointing at the bytes you wish were gone.

1. The V8 heap: new space and old space

V8 splits the managed heap into generations, betting on a well-known empirical fact: most objects die young. A request handler allocates a request object, some buffers, a few closures, and almost all of it is garbage by the time the response is sent.

  • New space (young generation) — small (a few megabytes per semi-space; tens of MB total at most), where every object is born. Collected very frequently and very cheaply.
  • Old space (old generation) — large, where objects that survive a couple of young-generation collections get promoted. Collected rarely and expensively.

There are other regions — large-object space for allocations too big for a normal page, code space, a separate space for maps/hidden classes — but for leak hunting, new space versus old space is the model that matters. A leak is, by definition, objects that keep getting promoted into old space and never collected. Old space grows monotonically, GC runs more and more often trying to reclaim it, your event loop stalls during the pauses, and eventually you hit the heap limit and the process dies with FATAL ERROR: ... JavaScript heap out of memory.

2. How GC actually works: scavenge plus mark-sweep

Two algorithms, one per generation. The interview answer is "generational GC: a fast copying collector for the young generation, a mark-sweep/mark-compact collector for the old generation."

Scavenge (new space). A Cheney-style semi-space copying collector. New space is split in two halves, from-space and to-space. Allocation is a pointer bump in from-space. When it fills, the scavenger copies the live objects into to-space and swaps the roles. Dead objects are never touched — you don't pay for garbage, you pay for survivors. Anything that survives a scavenge or two gets promoted to old space. This is why short-lived allocation is genuinely cheap in V8. (Modern V8 runs the scavenger in parallel across helper threads, but the copying model is unchanged.)

Mark-sweep / mark-compact (old space). Old space is too big to copy wholesale, so V8 marks instead. Starting from the roots (the stack, globals, etc.), it walks every reference and marks reachable objects. Then sweep reclaims the unmarked gaps; periodically compact moves objects to defragment. Modern V8 does most of marking concurrently and incrementally to keep main-thread pauses short, but the pauses are still far more expensive than a scavenge.

The practical takeaway: a leak makes old space the dominant cost. You'll see major (mark-sweep) GC firing repeatedly and reclaiming almost nothing. That signature — frequent major GCs with a heap floor that only ratchets upward — is the leak.

3. The leak shapes you will actually see

Real leaks are boring and repetitive. Learn the handful of shapes and you'll diagnose most of them by smell before you open a profiler.

Unbounded Map or cache. The number-one Node leak. You add an entry per request/user/key and never evict. A Map keyed by user ID for "performance" is a memory leak with a feature flag.

// LEAK: grows forever, one entry per unique key, never evicted
const cache = new Map();

function getUser(id) {
  if (!cache.has(id)) cache.set(id, fetchUserSync(id));
  return cache.get(id);
}

Forgotten event listeners. You attach a listener to a long-lived emitter (a socket, a global bus, process) inside a per-request scope and never remove it. The emitter holds the listener, the listener's closure holds your request state, and none of it dies. Node's warning is the giveaway: MaxListenersExceededWarning: Possible EventEmitter memory leak detected. 11 foo listeners added to [EventEmitter]. That default limit of 10 (events.defaultMaxListeners, readable per-emitter via emitter.getMaxListeners()) exists precisely to catch this — raising it with setMaxListeners to silence the warning is treating the smoke detector as the problem.

Closures capturing big objects. A closure keeps alive the variables in its lexical scope that it actually references (V8 generally does not retain captured variables a closure never uses, but do not rely on that — one accidental reference pins the lot). Stash one small callback in a long-lived structure and you may be pinning the entire surrounding context — a big buffer, a parsed payload, a DB result set.

const timers = [];
function onRequest(req) {
  const bigPayload = req.body; // megabytes
  // The timer closure outlives the request and pins bigPayload
  timers.push(setInterval(() => log(bigPayload.id), 60_000));
}

Global arrays and module-level state. Anything reachable from a module-level const is a GC root for the life of the process. const events = []; events.push(...) on every request is an unbounded global. So are accidental globals from a missing const/let in non-strict code.

Two honorable mentions. Buffer/typed-array backing memory lives off the V8 heap, so a buffer leak shows up as growing RSS while heapUsed looks calm — watch arrayBuffers (and external) in memoryUsage(). And dangling timers (setInterval never cleared) keep their closures alive forever, exactly like listeners.

4. Bounding memory at the boundary: --max-old-space-size

By default V8 caps old space conservatively, and on smaller containers the default can be lower than the RAM you actually have (recent Node versions size the heap container-aware, but you should still pin it explicitly). --max-old-space-size sets that cap in megabytes:

node --max-old-space-size=2048 server.js

Be clear about what this flag is and isn't. It does not fix a leak — a true leak will just take longer to crash, and the longer runway means longer, more painful GC pauses on the way down. What it's actually for: (1) letting a legitimately large-but-bounded workload use the memory on the box instead of OOM-ing at the default ceiling, and (2) making the JS heap limit predictable inside a container so V8 dies with a clean heap error before the kernel OOM-killer sends a less informative SIGKILL. Set it a comfortable margin below the container's memory limit (the JS heap is only part of RSS — off-heap buffers and native memory live on top of it). Treat hitting the ceiling as a symptom to investigate, never as the thing the flag was supposed to prevent.

5. Diagnosing: memoryUsage, --inspect, and heap snapshots

Start coarse, then get precise. First confirm it's actually a leak and not steady-state working set, by watching the trend over time.

setInterval(() => {
  const m = process.memoryUsage();
  const mb = (n) => Math.round(n / 1024 / 1024);
  console.log(
    `rss=${mb(m.rss)} heapUsed=${mb(m.heapUsed)} ` +
    `heapTotal=${mb(m.heapTotal)} external=${mb(m.external)} ` +
    `arrayBuffers=${mb(m.arrayBuffers)}`
  );
}, 10_000);

Read it like this: heapUsed trending up across GC cycles is a JS-heap leak (objects, closures, Maps). rss and external/arrayBuffers climbing while heapUsed is flat points off-heap — buffers, or native addon memory. (arrayBuffers is a subset of external: all ArrayBuffer/Buffer memory is counted in external, which also includes other C++ allocations.) The word "trending" is doing the work: a single sample tells you nothing, because memory naturally saws up and down between collections. You want the floor after GC to rise.

Once you know the JS heap is leaking, take heap snapshots. The reliable workflow is to capture two snapshots — one after warmup, one after the leak has clearly grown — and diff them. You can drive snapshots three ways:

  • --inspect + Chrome DevTools. Run node --inspect server.js, open chrome://inspect, attach, and use the Memory tab. Take a snapshot, generate load, take another, then switch the view to Comparison and sort by Delta (count) and Size Delta. The constructor whose count grows by roughly your request count is your leak.
  • Programmatic, from inside the process — best for production, since you don't need an open inspector port:
const v8 = require('node:v8');
// Writes a .heapsnapshot file you can load into DevTools later
const file = v8.writeHeapSnapshot();
console.log('snapshot written to', file);
  • --heapsnapshot-near-heap-limit=2. Tells Node to dump up to N snapshots automatically just before it OOMs, so a crashing production process leaves you forensic evidence instead of just a stack trace.

In the snapshot, two columns earn their keep. Shallow size is the object itself; retained size is everything that would be freed if this object went away — that's what you sort by to find the root of a leak. Then use the retainers view to answer the only question that matters: what is keeping this alive? The retainer chain walks you straight back to the offending Map, array, or closure. Also watch the distance column and the (string)/(closure) synthetic rows — a giant pile of strings retained by one Map is the unbounded-cache shape, made visible.

6. Two fixes: the listener leak and the bounded cache

The listener leak. The bug is adding to a long-lived emitter per request and never removing. The fix is symmetric add/remove — and once when the listener is genuinely one-shot, because it auto-removes after firing.

// LEAK: a new listener on the shared bus every request, never removed
function handler(req, res) {
  bus.on('tick', () => res.write('still here'));
  // ... bus retains the closure, which retains req/res, forever
}

// FIX: remove on the connection lifecycle; or use once() if one-shot
function handler(req, res) {
  const onTick = () => res.write('still here');
  bus.on('tick', onTick);
  res.on('close', () => bus.off('tick', onTick)); // symmetric teardown
}

Note you must keep a reference to the same function to remove it — bus.off('tick', () => {}) with a fresh arrow does nothing, because it's a different function identity.

The bounded cache. The fix for an unbounded Map is a cache with an eviction policy — a size cap (LRU), a TTL, or both. Here's a minimal LRU exploiting the fact that JS Map preserves insertion order, so the first key is the oldest:

class LruCache {
  constructor(max = 1000) {
    this.max = max;
    this.map = new Map();
  }
  get(key) {
    if (!this.map.has(key)) return undefined;
    const val = this.map.get(key);
    this.map.delete(key);      // re-insert to mark as most-recently-used
    this.map.set(key, val);
    return val;
  }
  set(key, val) {
    if (this.map.has(key)) this.map.delete(key);
    this.map.set(key, val);
    if (this.map.size > this.max) {
      this.map.delete(this.map.keys().next().value); // evict oldest
    }
  }
}

In production, reach for a battle-tested library (lru-cache) that adds TTL, max-byte sizing, and stale-while-revalidate. The point isn't this exact class — it's that every long-lived cache needs an explicit, finite bound. If you can't answer "what is the maximum number of entries this can hold," you have a leak waiting for production traffic.

For the closure and global-array shapes, the fixes follow the same principle: don't capture more than you need (pull req.body.id into a small local before the closure instead of capturing all of req.body), and never push into a module-level array without a corresponding bound or drain. When you genuinely need to associate data with an object without keeping that object alive, use a WeakMap — its keys are weakly held, so once nothing else references a key object the GC can collect both the key and its associated value.

Rules of thumb

  • A leak is a live reference, not a GC bug. The diagnostic question is always "what retains this?" — answer it with the retainers view in a heap snapshot.
  • Watch the post-GC floor of heapUsed over time, not single samples. Rising floor = leak; growing rss/arrayBuffers with flat heapUsed = off-heap (buffers/native).
  • Every cache, Map, and module-level array needs an explicit bound (LRU/TTL). "Cache without eviction" is just a slow leak.
  • Listeners and timers are leak factories: pair every on with an off, prefer once for one-shots, and clear your intervals. Trust the MaxListenersExceededWarning instead of silencing it.
  • --max-old-space-size bounds the crash, it does not fix the leak. Diff two heap snapshots and sort by retained size to find the real culprit.
SharePostLinkedIn

Reader Discussion

2 replies// weighed in

TopNewestAuthor
Add to the thread
Disagree, agree harder, or share your own experience…
Email instead →markdown okbe kind
  1. Rachel Gold· Staff SREAgrees

    the on-call framing throughout this piece is what makes it land. too many infra articles assume you never get paged. those are written by people who never got paged.

    May 14, 2026·3 days later
  2. Omar Khalil· Senior SWEKind words

    this is the third article from this blog I've sent to my team this month. you're cooking. don't switch to crypto.

    May 16, 2026·5 days later

Worked on something similar? Email ducminhldm@gmail.com — I read every one. The good ones become future posts.

Comments seeded · live discussion via email