Streams and Backpressure in Node.js

Backpressure is the contract that keeps a fast producer from drowning a slow consumer in unbounded memory; pipe and pipeline honor it for you, manual data+write does not.

May 07, 202611 min readstreamsbackpressure

A stream is a producer and a consumer connected by a bounded buffer, and the only thing standing between you and an OOM kill is whether you respect that buffer. Backpressure is the flow-control protocol that lets a slow consumer tell a fast producer to wait. Get it wrong and Node will happily queue gigabytes in memory; get it right and a 50 GB file copy runs in a few megabytes of RAM.

1. The four stream types

Every Node stream is one of four shapes, all built on the same internal buffering machinery:

Readable — a source you pull from: fs.createReadStream, an HTTP request body, process.stdin. It emits 'data' and 'end'.
Writable — a sink you push to: fs.createWriteStream, an HTTP response, process.stdout. You call write() and end().
Duplex — independent read and write sides over one object, e.g. a TCP net.Socket. The two sides have separate buffers and are not connected to each other.
Transform — a Duplex where the write side feeds the read side through your _transform function: zlib.createGzip(), a cipher, a CSV parser. What you write comes out transformed.

The interview-ready distinction: a Duplex has two unrelated channels (think a phone call); a Transform's output is a function of its input (think a translator). Transform extends Duplex precisely so it inherits both a Readable and a Writable interface.

2. The backpressure contract

Backpressure lives entirely in three primitives on the Writable side. This is the contract, and it is worth memorizing verbatim:

writable.write(chunk) returns a boolean. true means "keep going"; false means "the internal buffer is at or above highWaterMark, please stop writing."
When you get false, you must wait for the 'drain' event before writing again. 'drain' fires once the write buffer has fully emptied — that is, once every queued chunk has been flushed and writableLength returns to 0, not merely when it dips back below highWaterMark.
highWaterMark is the threshold (default 64 KB for byte streams, 16 objects in object mode). It is not a hard cap — it is the line at which write() starts returning false so a well-behaved producer throttles itself.

Critically, write() returning false does not drop the chunk. The chunk is still accepted and queued; the return value is purely advisory. If you ignore it and keep writing, Node keeps buffering — without bound — and that is exactly how you blow up memory. Backpressure is cooperative: the slow consumer signals, but the producer must actually listen.

On the Readable side the mirror image is readable.pause() / readable.resume(), plus the highWaterMark on the read buffer. The flow-control loop is: producer writes until write() says false → producer pauses the source → buffer drains → 'drain' fires → producer resumes the source. Wiring that loop by hand is where bugs live, which is the whole reason pipe() and pipeline() exist.

3. The memory blowup: `'data'` + `write()`

Here is the classic mistake. It looks correct, it passes every test against small inputs, and it falls over the moment the source is faster than the sink:

const fs = require('fs');

// A fast source feeding a slow sink. DO NOT do this.
const src = fs.createReadStream('huge.bin');     // disk/SSD: fast
const dst = createSlowWritable();                 // e.g. network upload: slow

src.on('data', (chunk) => {
  dst.write(chunk);        // return value ignored!
});
src.on('end', () => dst.end());

The Readable in flowing mode emits 'data' as fast as it can read from disk. Each dst.write(chunk) returns false almost immediately because the slow sink can't keep up — but we never check it, never pause src, and never wait for 'drain'. So every chunk the source produces is appended to dst's internal buffer faster than it drains. That buffer is just an in-memory linked list of chunks with no ceiling. RSS climbs until the process is OOM-killed. You can watch it happen:

const dst = createSlowWritable();
const src = fs.createReadStream('huge.bin');

src.on('data', (chunk) => {
  const ok = dst.write(chunk);
  // ok is false constantly, but we keep going anyway
});

setInterval(() => {
  const mb = (process.memoryUsage().rss / 1024 / 1024) | 0;
  // buffered bytes the sink hasn't accepted yet:
  console.log(`rss=${mb}MB writableLength=${dst.writableLength}`);
}, 200).unref();

You'll see writableLength grow without bound and RSS track it upward. The fix is to honor the return value: pause on false, resume on 'drain'.

// Manual backpressure done correctly — verbose, but correct.
src.on('data', (chunk) => {
  const ok = dst.write(chunk);
  if (!ok) src.pause();          // stop reading until the sink catches up
});
dst.on('drain', () => src.resume());
src.on('end', () => dst.end());

This works, but notice everything it does not handle: errors on either stream, cleanup of file descriptors when one side fails, the 'end'/'finish' ordering, and the case where dst errors while src is paused. That accounting is exactly what you should never write by hand in production.

4. Why `pipe()` and `pipeline()` handle it for you

readable.pipe(writable) implements the entire pause/drain/resume loop internally. It checks the write() return value, pauses the source on false, and resumes on 'drain' — so memory stays bounded by highWaterMark regardless of input size:

fs.createReadStream('huge.bin').pipe(createSlowWritable());

That single line is backpressure-correct. But pipe() has a notorious flaw: it does not forward errors and does not clean up. If the source errors mid-stream, the destination is not closed; if the destination errors, the source keeps reading into a dead sink. You leak file descriptors and sockets. In a chain like a.pipe(b).pipe(c) an error in b leaves a and c dangling. You'd have to attach 'error' handlers to every stream and destroy the others manually — and almost nobody gets that right.

stream.pipeline() (Node 10+) fixes this. It does everything pipe() does for backpressure, plus it propagates errors, destroys every stream in the chain on any failure, and gives you a single completion callback. This is the correct production primitive:

const { pipeline } = require('stream');
const fs = require('fs');
const zlib = require('zlib');

pipeline(
  fs.createReadStream('huge.bin'),
  zlib.createGzip(),
  fs.createWriteStream('huge.bin.gz'),
  (err) => {
    if (err) console.error('pipeline failed:', err);
    else console.log('done — all fds closed, memory bounded');
  }
);

If gzip throws, or the write fails because the disk is full, pipeline destroys the read stream and the gzip stream too, closes the file descriptors, and calls your callback with the error. Compare that to a three-way pipe() chain where you'd need error handlers on all three and manual destroy() calls.

The promise form is cleaner still and composes with async/await and AbortSignal:

const { pipeline } = require('stream/promises');

async function gzipFile(src, dst, signal) {
  await pipeline(
    fs.createReadStream(src),
    zlib.createGzip(),
    fs.createWriteStream(dst),
    { signal }      // abort the whole chain via an AbortController
  );
}

5. Object mode

By default streams move Buffer/string chunks and highWaterMark counts bytes. Set objectMode: true and each chunk is an arbitrary JS value — an object, an array, a parsed record — and highWaterMark counts objects instead of bytes (default 16). This is how you build streaming data pipelines: read NDJSON, transform records, write to a database, all with backpressure intact.

const { Transform, pipeline } = require('stream');

const toUpper = new Transform({
  objectMode: true,
  transform(record, _enc, cb) {
    // record is a real object, not a Buffer
    cb(null, { ...record, name: record.name.toUpperCase() });
  },
});

pipeline(
  readRecordsFromDb(),   // an object-mode Readable
  toUpper,
  writeRecordsToFile(),  // an object-mode Writable
  (err) => err && console.error(err),
);

The backpressure contract is identical — write() still returns false after 16 queued objects, 'drain' still fires — but the unit of accounting is the object, not the byte. One subtlety to flag in interviews: in object mode a single "chunk" can be an arbitrarily large object, so highWaterMark of 16 objects says nothing about actual memory. If your objects are huge, lower the watermark or keep them small. The watermark bounds count, not size, in object mode.

6. Error handling, the part everyone skips

Three rules that prevent the most common production stream failures:

An 'error' event with no listener is re-thrown as an uncaught exception, which terminates the process by default. With pipeline you get one callback for the whole chain, so you never have an unhandled 'error' on an intermediate stream.
Errors do not propagate through pipe(). Use pipeline (or stream.finished for a single stream) so a failure tears down and closes every stream — no leaked file descriptors or half-open sockets.
Inside a Transform, report failures by calling cb(err), not by throwing synchronously from an async path. pipeline turns that cb(err) into chain teardown.

If you only ever take one thing from this: in application code you should almost never see a bare .pipe() or a manual 'data'+write() loop. Reach for pipeline, and let it own backpressure, error propagation, and cleanup together — because those three concerns are inseparable, and hand-rolling any one of them tends to break the other two.

Rules of thumb

Never ignore the return value of write(). false means stop and wait for 'drain'; ignoring it is the canonical memory leak.
Use pipeline, not pipe, in production. pipe handles backpressure but leaks file descriptors on error; pipeline handles backpressure and error propagation and cleanup.
highWaterMark is a throttle line, not a hard limit. Memory stays bounded only if the producer respects the resulting backpressure signal. The byte-stream default is 64 KB.
Object mode counts objects, not bytes. A 16-object watermark says nothing about memory if the objects are large.
Transform vs Duplex: Transform's output is a function of its input; a Duplex's read and write sides are independent.

Streams and Backpressure in Node.js

1. The four stream types

2. The backpressure contract

3. The memory blowup: `'data'` + `write()`

4. Why `pipe()` and `pipeline()` handle it for you

5. Object mode

6. Error handling, the part everyone skips

Rules of thumb

2 replies// weighed in

More from this topic

The Node.js Event Loop, Demystified

Worker Threads vs Cluster vs Child Process

Finding and Fixing Memory Leaks in Node.js

Streams and Backpressure in Node.js

1. The four stream types

2. The backpressure contract

3. The memory blowup: 'data' + write()

4. Why pipe() and pipeline() handle it for you

5. Object mode

6. Error handling, the part everyone skips

Rules of thumb

2 replies// weighed in

More from this topic

The Node.js Event Loop, Demystified

Worker Threads vs Cluster vs Child Process

Finding and Fixing Memory Leaks in Node.js

3. The memory blowup: `'data'` + `write()`

4. Why `pipe()` and `pipeline()` handle it for you