ML
Node.js

Worker Threads vs Cluster vs Child Process

Three ways to escape Node's single JS thread — and which one actually fits the problem in front of you.

May 09, 202610 min readNode.jsConcurrency

Node runs your JavaScript on one thread, so a single CPU-bound function can stall an entire server. The escape hatches — worker_threads, cluster, and child_process — all give you parallelism, but they solve different problems and pay different costs at the boundary. Picking the wrong one is the most common way people make a fast service slower.

1. What "single-threaded" actually means

The misleading part: Node is not entirely single-threaded. Your JavaScript runs on one thread — the one that owns the event loop and the V8 isolate. But libuv keeps a thread pool (default 4 threads, set via UV_THREADPOOL_SIZE) that handles filesystem calls and some crypto/DNS work, while network I/O is driven by the OS via epoll/kqueue/IOCP. So when you await db.query(), your thread isn't blocked; it registers a callback and moves on.

The problem is purely synchronous CPU work. A tight loop, a big JSON.parse, image resizing, a synchronous bcrypt with a high cost factor, parsing a 50MB CSV — these run on the JS thread and block the event loop. While that runs, no other request gets serviced, timers don't fire, and health checks time out. The interviewer's framing: I/O concurrency is free in Node; CPU parallelism is not. The three tools below exist to move CPU work, or whole programs, off the one thread you can't afford to block.

2. worker_threads — real threads, shared memory

A worker thread is a second V8 isolate with its own event loop, running in the same process. Each worker has its own heap and its own JS thread, so two workers genuinely run in parallel on two cores. This is the right tool for CPU-bound work inside your app: hashing, compression, parsing, math, anything that would otherwise pin the main thread.

Because they share a process, threads are cheaper to spawn than processes and — crucially — they can share raw memory through SharedArrayBuffer. Anything else you send goes through postMessage, which uses the structured-clone algorithm: a deep copy. That copy is the cost. Passing a 100MB buffer by clone means allocating and copying 100MB; passing it as a transfer (via the transferList) moves ownership with no copy but detaches it on the sending side, leaving it unusable there.

// hash-worker.js
const { parentPort, workerData } = require('node:worker_threads');
const crypto = require('node:crypto');

// CPU-bound: a deliberately slow KDF. scryptSync blocks THIS
// worker's thread, which is fine — the main thread stays free.
const out = crypto.scryptSync(workerData.password, workerData.salt, 64);
parentPort.postMessage(out); // Buffer, structured-cloned back
// main.js
const { Worker } = require('node:worker_threads');

function hashOffThread(password, salt) {
  return new Promise((resolve, reject) => {
    const w = new Worker('./hash-worker.js', {
      workerData: { password, salt },
    });
    w.once('message', (buf) => resolve(buf));
    w.once('error', reject);
    w.once('exit', (code) => {
      if (code !== 0) reject(new Error(`worker exited ${code}`));
    });
  });
}

// The event loop stays free while scrypt grinds on another core.
app.post('/login', async (req, res) => {
  const digest = await hashOffThread(req.body.password, req.body.salt);
  res.json({ ok: true, len: digest.length });
});

Two things to internalize. First, workerData and every postMessage payload is cloned across the boundary — there is no shared scope, no shared globals, no shared closures. Second, spinning up a fresh Worker per request is wasteful: V8 isolate startup isn't free, and you'd serialize on creation cost. In production you keep a pool of long-lived workers and hand them jobs over a MessageChannel. Libraries like piscina do exactly this; the mechanism is a queue plus N idle workers waiting on parentPort.

3. SharedArrayBuffer and MessagePort — avoiding the copy

When the data is large and hot, cloning on every message dominates. SharedArrayBuffer backs a typed array with memory both threads can read and write directly — no copy, no transfer; only the SAB handle crosses the boundary. You coordinate with Atomics (compare-and-swap, Atomics.wait/Atomics.notify) to avoid races, because you are now writing genuinely concurrent code with all the hazards that implies.

const { Worker } = require('node:worker_threads');

const shared = new SharedArrayBuffer(1024 * 1024);
const view = new Int32Array(shared);

const w = new Worker('./fill-worker.js', { workerData: { shared } });
// Both threads see the same bytes; only the SAB handle was cloned,
// not the megabyte behind it.

MessagePort is the other half of the story. A MessageChannel gives you two linked ports; transfer one to a worker and you have a private, bidirectional channel independent of the default parentPort. This is how you route multiple job types to one worker, or wire workers to talk to each other directly. Ports are themselves transferable, which is what makes pool architectures composable.

Rule for the boundary: prefer transfer over clone for big buffers you're done with on the sender, and SharedArrayBuffer when both sides need concurrent access. Reach for plain clone only when payloads are small.

4. cluster — N processes sharing one port

cluster solves a different problem: scaling an I/O-bound server across cores. It forks the primary process into N workers (separate processes, separate everything) and lets them all accept connections on the same listening port. By default on every platform except Windows, the primary accepts connections and distributes them round-robin (SCHED_RR) to the workers; on Windows the default is SCHED_NONE, where the kernel hands each accepted connection to whichever worker grabs it. Either way a 4-core box can run 4 event loops handling requests in parallel.

const cluster = require('node:cluster');
const http = require('node:http');
const { availableParallelism } = require('node:os');

if (cluster.isPrimary) {
  const n = availableParallelism();
  for (let i = 0; i < n; i++) cluster.fork();
  cluster.on('exit', (worker, code, signal) => {
    console.error(`worker ${worker.process.pid} died (${signal || code}), restarting`);
    cluster.fork(); // keep the herd at full strength
  });
} else {
  http.createServer((req, res) => {
    res.end(`handled by pid ${process.pid}\n`);
  }).listen(8080);
}

Note what cluster does not do: it doesn't move CPU work off any individual event loop. If one request runs a 2-second synchronous loop, the worker handling it is still blocked for those 2 seconds — you've just made it 1-of-N instead of 1-of-1. Cluster multiplies throughput for many independent I/O-bound requests; it does not fix a slow handler. Workers don't share memory, so per-process caches and in-memory sessions diverge — push shared state to Redis or a database. In practice many teams skip the cluster module entirely and run N single-threaded processes behind a load balancer or under a process manager, which is operationally the same idea with better isolation.

5. child_process — running other programs

child_process is for launching separate executables: ffmpeg, git, a Python script, ImageMagick, anything that isn't your Node code. It's the most general and the most isolated — a full OS process with its own memory and its own runtime, talking to you over stdio or IPC.

The API has four entry points worth distinguishing. spawn streams stdout/stderr and is what you want for long-running or high-output processes — no buffer ceiling. exec buffers all output into memory and runs the command through a shell, which means it can exceed its maxBuffer limit (1MB by default) and is a shell-injection risk if you interpolate user input. execFile runs a binary directly with an argument array and no shell (safer and faster), buffering output like exec. fork is a specialization of spawn that launches a new Node process with a built-in IPC channel for process.send and 'message' events.

const { spawn } = require('node:child_process');

// Transcode without blocking; stream progress, don't buffer it.
const ff = spawn('ffmpeg', ['-i', input, '-c:v', 'libx264', output]);

ff.stderr.on('data', (d) => parseProgress(d.toString()));
ff.on('close', (code) => {
  if (code === 0) done();
  else fail(new Error(`ffmpeg exited ${code}`));
});

Use spawn/execFile with an argument array, and never string-interpolate user input into exec. The boundary cost here is the heaviest of the three — a full process launch plus serialization over a pipe — so reserve it for genuinely external work or strong fault isolation, not for offloading a JS function you could have put in a worker thread.

6. Choosing — and the cost at the boundary

Every option here crosses a boundary, and the boundary is where the cost lives. Worker threads clone (or transfer, or share) messages between isolates. Cluster and child processes serialize over IPC pipes between separate OS processes. The faster the work and the bigger the payload, the more that crossing dominates — offloading a trivial, sub-millisecond computation to a worker can easily cost more in clone and scheduling overhead than it saves. Offload work that is coarse-grained: big enough that the serialization is noise against the compute.

The decision tree most interviews want to hear:

  • CPU-bound work inside my app (parsing, hashing, math, image processing) → worker_threads, pooled, with SharedArrayBuffer/transfer for large data.
  • An I/O-bound HTTP server that needs to use all corescluster (or N processes behind a balancer). Not for blocking handlers.
  • Running an external program or wanting hard process isolationchild_process (spawn/execFile; fork for a child Node process).

The trap worth naming explicitly: workers do not help I/O-bound work. If your handler is mostly awaiting a database or an upstream API, the event loop is already idle during those waits — that's exactly what Node is good at. Wrapping that in a worker thread just adds clone overhead and a context switch to recover concurrency you already had. Workers earn their keep only when the JS thread is actually busy computing.

Rules of thumb

  • I/O concurrency is free in Node; only synchronous CPU work blocks the event loop. Diagnose which you have before reaching for any of these.
  • worker_threads = CPU work, shared process, shared memory possible. cluster = scale an I/O server across cores. child_process = run other programs.
  • The boundary always costs something — clone, transfer, or IPC. Offload coarse-grained work, and use SharedArrayBuffer/transfer for large payloads to skip the copy.
  • Pool your workers; never spawn one per request. Restart cluster/child workers on exit so a crash doesn't shrink your capacity.
  • Workers never speed up I/O-bound code. If the thread is idle waiting on the network, parallelism buys you nothing.
SharePostLinkedIn

Reader Discussion

1 replies// weighed in

TopNewestAuthor
Add to the thread
Disagree, agree harder, or share your own experience…
Email instead →markdown okbe kind
  1. Isabella Costa· Junior EngineerKind words

    saved this. sharing at standup tomorrow — we've had exactly this problem for 2 sprints and nobody on the team had framed it this way 🙏

    May 11, 2026·2 days later

Worked on something similar? Email ducminhldm@gmail.com — I read every one. The good ones become future posts.

Comments seeded · live discussion via email