JVM Memory and Garbage Collection for Engineers
A working model of where JVM objects live, how generational collectors reclaim them, and when GC tuning is the wrong fix.
Most "GC problems" are allocation problems wearing a costume. To debug them you need an accurate mental model of where objects live, how the collector reclaims them, and which knobs actually matter. This is that model, plus the parts interviewers and on-call rotations keep asking about.
1. The memory map: heap, metaspace, stacks
A running JVM divides memory into a few distinct regions, and conflating them is the root of most confusion.
The heap holds all objects and arrays. It is split into a young generation and an old generation (a.k.a. tenured). The young gen is itself divided into eden and two survivor spaces (S0 and S1). New objects are allocated in eden. The two survivor spaces exist so the collector can copy live young objects back and forth, aging them across collections; one survivor is always empty between collections.
Metaspace holds class metadata — the runtime representation of loaded classes, methods, and constant pools. It lives in native memory, not the heap. This replaced the old PermGen (removed in Java 8). Crucially, metaspace grows with the number of distinct classes loaded, not with your data volume.
Each thread gets its own stack, holding frames for method calls: local variables, operands, and return addresses. Stacks are native memory too, sized per-thread via -Xss. Deep or infinite recursion exhausts a stack and throws StackOverflowError — a different beast from anything heap-related.
// Roughly where each thing lives:
int n = 42; // primitive local -> stack frame
Object o = new Object(); // the reference 'o' -> stack;
// the object itself -> heap (eden)
String[] arr = new String[1000]; // array object -> heap
// The String class's metadata -> metaspace (loaded once)
The interview-grade summary: objects on the heap, references and primitives on the stack, class metadata in metaspace. Knowing which region is exhausted tells you which problem you actually have.
2. The generational hypothesis
Generational GC rests on one empirical observation, the weak generational hypothesis: most objects die young. A request handler allocates a flurry of short-lived objects — DTOs, buffers, intermediate collections — that become garbage almost immediately. A smaller set survives long term (caches, connection pools, session state).
This shapes the whole design. If you can cheaply collect the region where the vast majority of objects die, you reclaim most garbage while touching very little memory. So the heap is structured to make young-gen collection fast and frequent, and old-gen collection rare and expensive.
The mechanism: allocate in eden. When eden fills, run a minor GC — trace from the roots (stacks, statics, registers), copy the few survivors into a survivor space, and declare all of eden free in one stroke. Objects that survive enough minor GCs are promoted (tenured) into the old generation. The cost of a minor GC is proportional to the live set, not the garbage, which is why it stays cheap when objects die young.
3. Minor vs. major GC, and stop-the-world
A minor GC collects only the young generation. It is frequent and usually short. A major GC (or full GC, depending on the collector) involves the old generation and is far more expensive because the live set there is large and long-lived.
Both are typically stop-the-world (STW): application threads are paused at a safepoint while the collector works. This pause is what shows up as latency. A 10ms minor pause every few seconds is invisible; a multi-second full GC is an outage. The entire evolution of modern collectors is a campaign to shrink and bound these pauses.
Why STW at all? Tracing a graph of objects while the application is concurrently mutating that graph is the hard problem. Some phases can run concurrently with bookkeeping (write barriers, remembered sets) to track mutations; others — like updating references when objects move — are easier to do correctly with threads paused. The trade is always pause time versus throughput versus complexity.
One pathology to recognize: premature promotion. If the survivor spaces are too small or the tenuring threshold is too low, short-lived objects get promoted into the old gen, which fills, which triggers expensive full GCs. The symptom looks like an old-gen problem but the cause is young-gen sizing or allocation rate.
4. Modern collectors: G1, ZGC, Shenandoah
G1 (Garbage-First) is the default since Java 9. It abandons large contiguous generations and instead splits the heap into many equal-sized regions, each a power-of-two size the JVM picks ergonomically from the heap size (commonly 1–32MB on typical heaps, larger on very big ones). Any region can be eden, survivor, or old at a given time. G1 tracks how much garbage each region holds and collects the most-garbage regions first — hence "garbage first" — to meet a pause-time goal you set with -XX:MaxGCPauseMillis (default 200ms). It does the bulk of old-gen marking concurrently, then evacuates regions during short STW pauses. G1 is the right default for most server workloads: predictable, self-tuning, good throughput.
ZGC and Shenandoah are the low-pause collectors. Their goal is pause times that stay flat (sub-millisecond to low single-digit milliseconds) and largely independent of heap size, by doing essentially all the work — including relocating live objects — concurrently with the application. They achieve this with techniques like colored pointers and load barriers (ZGC) that let the collector move objects while threads keep running. The trade-off is some throughput overhead and higher memory usage from the barriers and bookkeeping.
# Pick a collector explicitly:
-XX:+UseG1GC # default since Java 9
-XX:+UseZGC # low-pause, large heaps; generational by
# default since JDK 23 (non-gen mode
# removed in JDK 24)
-XX:+UseShenandoahGC # low-pause alternative
# G1 pause-time goal (a target, not a guarantee):
-XX:MaxGCPauseMillis=100
The interview answer on collector choice: use the G1 default unless you have a measured pause-time SLA it can't meet on a large heap — then evaluate ZGC or Shenandoah, knowing you trade throughput and footprint for flatter pauses. Don't switch collectors on a hunch.
5. Sizing the heap: -Xms, -Xmx, and friends
Two flags dominate: -Xms sets the initial heap size and -Xmx sets the maximum. A widespread production practice is to set them equal (-Xms4g -Xmx4g). This pre-commits the heap up front, avoids the cost of the JVM resizing the heap at runtime, and makes behavior predictable — you'd rather discover an oversized heap at startup than during a traffic spike.
# Fixed 4GB heap, explicit collector, GC logging on:
java -Xms4g -Xmx4g \
-XX:+UseG1GC \
-Xlog:gc*:file=gc.log:time,uptime,level,tags \
-XX:+HeapDumpOnOutOfMemoryError \
-XX:HeapDumpPath=/var/dumps \
-jar app.jar
Larger heaps reduce GC frequency but can increase individual pause duration on collectors whose work scales with the live set (more live data to trace and move), so bigger is not strictly better. In containers, prefer -XX:MaxRAMPercentage over a hardcoded -Xmx so the heap scales with the cgroup limit; modern JVMs are container-aware and read cgroup limits, but only if you let them (note the conservative default of MaxRAMPercentage=25). Always set -XX:+HeapDumpOnOutOfMemoryError in production — the dump at the moment of failure is worth more than any amount of after-the-fact guessing. Note that metaspace and thread stacks live outside -Xmx; a container can be OOM-killed by the kernel even with heap headroom to spare if native memory grows.
6. Reading OutOfMemoryError flavors
OutOfMemoryError is not one error. The message after it tells you which region failed and points at completely different root causes.
Java heap space— the heap is full and GC can't reclaim enough. Either a genuine memory leak (objects reachable from a root that should have been released — a growing staticMap, an unbounded cache, an unclosed resource) or simply too small a heap for the legitimate working set. The heap dump distinguishes them: a leak shows one dominator growing without bound.Metaspace— too many classes loaded. Classic causes: classloader leaks (frequent redeploys in an app server, or dynamic proxy / bytecode generation in a loop creating fresh classes that are never unloaded). The data volume is irrelevant; the class count is the problem. Cap it with-XX:MaxMetaspaceSizeso it fails fast and visibly instead of consuming all native memory.GC overhead limit exceeded— the JVM spent too much time in GC recovering too little memory (by default, more than 98% of time recovering less than 2% of the heap). This is the heap almost-full and the collector thrashing right at the edge. It's a louder, earlier symptom of the same condition asJava heap space. (Note: this check is a feature of the throughput/G1 collectors, not the low-pause ones.)unable to create native thread— not heap at all. You've hit an OS thread limit or exhausted native memory for thread stacks. The fix is fewer threads (bounded pools) or a smaller-Xss, not a bigger heap.
The discipline: read the full message, then look at the right region. Throwing -Xmx at a metaspace or native-thread OOM does nothing but delay the inevitable while masking the real cause.
7. When to tune GC — the interview answer
Here is the answer a senior interviewer is listening for, and it is counterintuitive: most of the time, you don't tune GC — you fix allocations.
GC pressure is downstream of allocation rate. If a service churns short-lived garbage — building giant intermediate lists, boxing primitives in hot loops, re-parsing the same JSON, logging with eager string concatenation, copying buffers needlessly — it forces frequent minor GCs and premature promotion no matter how the collector is configured. The most effective "GC tuning" is usually a code change that allocates less: stream instead of materializing, reuse buffers, choose the right collection size up front, avoid autoboxing on hot paths.
The honest order of operations:
- Measure first. Turn on GC logging (
-Xlog:gc*), confirm GC pauses are actually your latency problem and not the DB, the network, or lock contention. Most "GC" incidents aren't GC. - Reduce allocation. Profile allocation with an allocation profiler or flight recorder, find the hot allocation sites, and cut them. This fixes the cause.
- Size the heap correctly. Ensure the working set fits comfortably with headroom; right-size young vs. old so short-lived objects die in the young gen.
- Only then tune the collector. Adjust pause goals, or switch to ZGC/Shenandoah, when you have a measured SLA the default genuinely cannot meet after the above.
Reaching for arcane GC flags before profiling allocations is the classic junior move. The arcane flags rarely survive the next JVM upgrade anyway, because the collectors are increasingly good at self-tuning.
Rules of thumb
- Diagnose by region. Heap, metaspace, and thread stacks fail for unrelated reasons — read the OOM message before reaching for
-Xmx. - Fix allocations before tuning GC. Allocation rate drives GC pressure; less garbage beats any flag.
- Keep the G1 default unless a measured pause-time SLA on a large heap forces ZGC or Shenandoah — and accept the throughput/footprint trade when it does.
- Set
-Xms=-Xmx(orMaxRAMPercentagein containers) for predictable behavior, and always enable-XX:+HeapDumpOnOutOfMemoryError. - Measure first. Most incidents blamed on GC turn out to be the database, the network, or lock contention.