Aggregations Deep Dive: Bucket, Metric, and the Pitfalls

Aggregations are the analytics engine behind Kibana. Know how terms, date_histogram, and cardinality actually execute.

October 18, 20258 min readElasticsearchAnalytics

Aggregations turn an Elasticsearch cluster into a sharded analytics engine. They're also where most "why is Kibana slow?" investigations end.

The two families

Bucket aggregations group documents: terms, date_histogram, range, composite.
Metric aggregations compute over a bucket: avg, percentiles, cardinality.

You nest them: bucket by hour, then compute p95 latency per bucket.

Terms agg is an approximation

Every shard returns its top shard_size (default size · 1.5 + 10) and the coordinator merges them. This means terms can return inaccurate counts for long-tail buckets. The result includes doc_count_error_upper_bound for exactly this reason.

For exhaustive pagination over high-cardinality fields, use composite:

{
  "size": 0,
  "aggs": {
    "pages": {
      "composite": {
        "size": 1000,
        "sources": [{ "host": { "terms": { "field": "host.keyword" } } }]
      }
    }
  }
}

Cardinality uses HyperLogLog

cardinality is approximate (tunable via precision_threshold). For small sets it's exact; above the threshold error grows to ~1–2%. If you genuinely need exact distinct counts, you're looking at a different tool (a SQL warehouse).

date_histogram alignment

fixed_interval: 1h gives deterministic buckets aligned to epoch. calendar_interval: 1d is timezone-aware and boundary-aligned. Mixing the two across dashboards is a classic source of "my totals don't match."

Common performance mistakes

Running a terms agg on a text field. Always use the .keyword sub-field; otherwise you aggregate on analysed tokens.
Requesting too many buckets. A 1-second histogram over 30 days returns 2.6M buckets — the coordinator will OOM before it can respond.
Forgetting "size": 0. If you only want the aggregation, fetching documents is wasted work.

Transforms: pre-aggregated indices

For dashboards that rerun the same aggregation every 10 seconds, create a transform job. It materialises a compact summary index on a schedule; the dashboard queries the summary, not the raw logs. Cost drops by orders of magnitude.

Aggregations Deep Dive: Bucket, Metric, and the Pitfalls

The two families

Terms agg is an approximation

Cardinality uses HyperLogLog

date_histogram alignment

Common performance mistakes

Transforms: pre-aggregated indices

7 replies// weighed in

More from this topic

Inverted Index 101: How Elasticsearch Actually Finds Things

Shard Sizing: The Single Most Important Elasticsearch Decision

Mapping Explosion: Why Your Cluster Is Eating RAM for Breakfast