Distributed Tracing: Spans, Trace Context, and Why Your Trace Stops at the Queue
A trace is a tree of spans stitched together by a context that has to ride along every hop. The traces that break almost always break at a boundary where nobody propagated the headers. Here is the model.
The first time a trace "just works" it feels like magic: one request, one waterfall, every service and database call laid out in time. The first time it doesn't work, you get two disconnected traces and a service that mysteriously starts a brand-new request out of nowhere. Both behaviours come from the same mechanism — a small piece of context that has to be carried across every hop by hand. Understand that and tracing stops being magic in either direction.
A trace is a tree of spans
A span is one timed unit of work: an HTTP handler, a database query, a call to another service. Each span has a start time, a duration, a unique span_id, and a parent_span_id pointing at the span that caused it. Every span in one request shares a single trace_id. That shared id plus the parent links is what lets the backend reassemble a flat stream of spans into a tree:
trace_id = abc123
span A GET /checkout (parent: none, the root)
span B POST cart-service (parent: A)
span C SELECT … FROM cart (parent: B)
span D POST payment-service (parent: A)
Nothing in this structure is global state. Each service only knows its own span and who its parent was — the tree is an emergent property of everyone agreeing on the same trace_id and reporting their parent honestly.
Trace context is the thing that travels
For service B to set its parent_span_id to A's span, A must tell B about itself. That handoff is the trace context, and the W3C standard carries it in a single HTTP header:
traceparent: 00-abc123def...-00f067aa0ba902b7-01
^ ^ ^ ^
| trace_id parent span_id flags (01 = sampled)
version
On the way out, A's tracer injects this header. On the way in, B's tracer extracts it, starts a new span whose parent is the id in the header, and reuses the same trace_id. Auto-instrumentation does this for you on standard HTTP and gRPC clients — which is exactly why traces look effortless until you hit a hop the instrumentation doesn't cover.
Why your trace stops at the queue
Synchronous HTTP carries headers naturally, so the failure mode shows up at asynchronous boundaries. You publish a message to Kafka or SQS in one service and consume it in another. Unless you explicitly copy the trace context into the message metadata (Kafka headers, an SQS message attribute) and extract it on the consumer, the consumer starts a fresh root span with a brand-new trace_id. Result: two unrelated traces, and the producer→consumer link — usually the most interesting latency in the system — is invisible.
// producer: inject context INTO the message
propagator.inject(activeContext, record.headers, setter)
// consumer: extract it back OUT before starting the span
const ctx = propagator.extract(rootContext, record.headers, getter)
tracer.startSpan("process", {}, ctx) // parent is now the producer
The same gap appears across any boundary instrumentation doesn't auto-wire: a custom RPC protocol, a thread pool that loses the context-local, a setTimeout/background job, or shelling out to a subprocess. The rule generalises: a trace breaks wherever context is dropped instead of propagated.
Sampling: that flag is a decision, not a hint
You cannot afford to store every span at scale, so traces are sampled. The cheap approach is head-based sampling: the root service flips a coin (say 1%), writes the result into the traceparent flags byte, and every downstream service obeys it. This is why the sampled bit must propagate — otherwise half a trace gets kept and half discarded, leaving broken trees.
The trade-off: head sampling decides before it knows whether the request was interesting, so it throws away most of your errors and slow requests along with the boring ones. Tail-based sampling buffers all spans of a trace and decides after the fact (keep if it errored or exceeded a latency threshold), at the cost of a collector that holds whole traces in memory. Most teams start head-based and move error/slow paths to tail sampling once 1% stops catching the incidents.
Spans carry attributes, and high-cardinality is the point
Unlike metrics, where high-cardinality labels blow up your time-series database, spans want the specific values: the user_id, the order_id, the exact SQL, the downstream status code. A trace is one request, so there is no cardinality explosion — each value appears once. This is the practical reason tracing answers "why was this request slow?" where metrics can only answer "is the p99 bad?". Attach the identifiers that let you find the needle; that is the whole value proposition.
Rules of thumb
- A trace is a tree built from spans sharing a
trace_idand pointing at theirparent_span_id— no global state, just propagated context. - Traces break at boundaries auto-instrumentation doesn't cover: queues, custom protocols, background jobs, subprocesses. Inject/extract context there by hand.
- At a queue, copy the trace context into the message and extract it on the consumer, or you get two disconnected traces.
- The sampled bit must propagate unchanged across the whole request, or you store half-traces. Start head-based; add tail sampling to keep errors and slow requests.
- Put high-cardinality identifiers (user, order, request ids) on spans — that is exactly what lets you find one bad request, and unlike metrics it costs you nothing in cardinality.