Structured Logging That Actually Helps On-Call
Free-text logs are a write-only medium. Structured logs are searchable, correlatable, and worth half your debugging time back.
If you only change one thing in your observability stack this quarter, change your logs from free-text strings to structured JSON. Everything downstream gets easier — Elasticsearch indexes them naturally, Loki labels them precisely, your SRE stops grepping for substrings at 3am.
1. The bad pattern
logger.info("User " + userId + " bought " + count + " of " + sku + " for $" + amount);
This is a one-way trip. You cannot filter by userId without a regex. You cannot sum amount across log lines. You cannot correlate to a request because the request ID is somewhere in another log line entirely.
2. The shape that actually works
logger.info("order.created", {
user_id: userId,
sku: sku,
count: count,
amount_usd: amount,
request_id: ctx.request_id,
trace_id: ctx.trace_id,
});
Three properties of this line, in order of importance:
- Stable event name —
order.created— never built from variables. Dashboards key off it. - Stable field types —
amount_usdis always a number. Never sometimes a string. Elasticsearch will refuse you the second time. - Correlation IDs always present — request_id, trace_id, user_id. Without these, "find the related logs" is impossible.
3. Levels are not categories
INFO/WARN/ERROR exist to gate retention and alerting, not to classify what happened. A common anti-pattern is bumping a log to WARN because "it's important" — now you have an alert on every important log line and nobody reads warnings anymore.
A working policy:
- ERROR — something went wrong and the system did not recover. Should map 1:1 to alerts.
- WARN — something is degraded but the system worked. Useful for dashboards, not for paging.
- INFO — observable transitions: a request started, a job ran, a feature flag flipped. Searchable, not browsed.
- DEBUG — kept off by default, flipped on per-service when you are actively debugging.
4. Sensitive fields are a real cost
Once a PII string is in your log store, you own a privacy obligation forever. Most teams put a redaction layer in front of the logger that drops or hashes a fixed list of field names — password, token, ssn, card_number. The redactor lives in the same module as the logger so the obligation is enforced, not remembered.
5. Sampling for the firehose
A bursty endpoint at 10k QPS does not need 10k INFO lines per second forever. Sample with a deterministic hash:
// Keep 1% of /healthz logs, all of /payments logs:
if (path === "/healthz" && hash(traceId) % 100 !== 0) return;
Sample by trace_id rather than randomly, so when you keep a request, you keep the whole request — not the middle three lines of it.
6. One log per transition, not per line
A 17-step pipeline does not need 17 log lines. It needs one structured event at the end: { pipeline: "checkout", steps_run: 17, duration_ms: 420, outcome: "ok" }. The intermediate steps become metrics if you care about counts, or spans on a trace if you care about timing.
The mental model
Logs answer the question "what specifically happened with this one thing?". Metrics answer "what is happening in aggregate?". Traces answer "what was the path?". Structured logs put real data behind the first question. Free-text logs make you choose between paging your SRE and learning Lucene regex.