ML
Observability

Structured Logging That Actually Helps On-Call

Free-text logs are a write-only medium. Structured logs are searchable, correlatable, and worth half your debugging time back.

November 25, 20258 min readObservabilityLogging

If you only change one thing in your observability stack this quarter, change your logs from free-text strings to structured JSON. Everything downstream gets easier — Elasticsearch indexes them naturally, Loki labels them precisely, your SRE stops grepping for substrings at 3am.

1. The bad pattern

logger.info("User " + userId + " bought " + count + " of " + sku + " for $" + amount);

This is a one-way trip. You cannot filter by userId without a regex. You cannot sum amount across log lines. You cannot correlate to a request because the request ID is somewhere in another log line entirely.

2. The shape that actually works

logger.info("order.created", {
  user_id: userId,
  sku: sku,
  count: count,
  amount_usd: amount,
  request_id: ctx.request_id,
  trace_id: ctx.trace_id,
});

Three properties of this line, in order of importance:

  1. Stable event nameorder.created — never built from variables. Dashboards key off it.
  2. Stable field typesamount_usd is always a number. Never sometimes a string. Elasticsearch will refuse you the second time.
  3. Correlation IDs always present — request_id, trace_id, user_id. Without these, "find the related logs" is impossible.

3. Levels are not categories

INFO/WARN/ERROR exist to gate retention and alerting, not to classify what happened. A common anti-pattern is bumping a log to WARN because "it's important" — now you have an alert on every important log line and nobody reads warnings anymore.

A working policy:

  • ERROR — something went wrong and the system did not recover. Should map 1:1 to alerts.
  • WARN — something is degraded but the system worked. Useful for dashboards, not for paging.
  • INFO — observable transitions: a request started, a job ran, a feature flag flipped. Searchable, not browsed.
  • DEBUG — kept off by default, flipped on per-service when you are actively debugging.

4. Sensitive fields are a real cost

Once a PII string is in your log store, you own a privacy obligation forever. Most teams put a redaction layer in front of the logger that drops or hashes a fixed list of field names — password, token, ssn, card_number. The redactor lives in the same module as the logger so the obligation is enforced, not remembered.

5. Sampling for the firehose

A bursty endpoint at 10k QPS does not need 10k INFO lines per second forever. Sample with a deterministic hash:

// Keep 1% of /healthz logs, all of /payments logs:
if (path === "/healthz" && hash(traceId) % 100 !== 0) return;

Sample by trace_id rather than randomly, so when you keep a request, you keep the whole request — not the middle three lines of it.

6. One log per transition, not per line

A 17-step pipeline does not need 17 log lines. It needs one structured event at the end: { pipeline: "checkout", steps_run: 17, duration_ms: 420, outcome: "ok" }. The intermediate steps become metrics if you care about counts, or spans on a trace if you care about timing.

The mental model

Logs answer the question "what specifically happened with this one thing?". Metrics answer "what is happening in aggregate?". Traces answer "what was the path?". Structured logs put real data behind the first question. Free-text logs make you choose between paging your SRE and learning Lucene regex.

SharePostLinkedIn

Reader Discussion

1 replies// weighed in

TopNewestAuthor
Add to the thread
Disagree, agree harder, or share your own experience…
Email instead →markdown okbe kind
  1. Léa Dubois· SREAsks

    any chance you'd publish these as a PDF collection? would love to print and read offline on flights. screen-fatigue is real.

    Dec 01, 2025·6 days later

Worked on something similar? Email ducminhldm@gmail.com — I read every one. The good ones become future posts.

Comments seeded · live discussion via email