ML
Elasticsearch

Mapping Explosion: Why Your Cluster Is Eating RAM for Breakfast

Dynamic mapping is convenient — and the fastest way to fill your heap with fields you will never query.

September 08, 20257 min readElasticsearchPerformance

Elasticsearch's dynamic mapping is a thoughtful feature and a very common foot-gun. Ship a nested JSON blob with unpredictable keys into a log index and you'll grow a field for every unique key — forever.

What a field costs

Every field lives in the cluster state. At 10,000 fields on a busy shard, master nodes and JVM heap start groaning. The default safety rail is index.mapping.total_fields.limit=1000; bumping it is a common but wrong first response to Limit of total fields [1000] has been exceeded.

The usual culprit: key-as-field

{
  "metrics": {
    "cpu.usage.p95.host-01": 0.42,
    "cpu.usage.p95.host-02": 0.51,
    "cpu.usage.p95.host-03": 0.40
  }
}

This indexes a new field per host. A year of hosts = tens of thousands of fields. Rewrite to key-value arrays:

{
  "metrics": [
    { "name": "cpu.usage.p95", "host": "host-01", "value": 0.42 },
    { "name": "cpu.usage.p95", "host": "host-02", "value": 0.51 }
  ]
}

Now you have two values per event and a fixed schema.

Defensive mapping

Turn off blind dynamic mapping. Use dynamic: "strict" for core entities and dynamic_templates for optional fields that must match a pattern:

"dynamic_templates": [
  {
    "string_as_keyword": {
      "match_mapping_type": "string",
      "mapping": { "type": "keyword", "ignore_above": 256 }
    }
  }
]

Runtime fields for the "I need it once" case

Need to query a field that isn't indexed? Use a runtime field — computed at query time from a Painless script. Slower per query, but no mapping cost and no reindex.

{ "runtime_mappings": {
  "path_depth": { "type": "long",
    "script": "emit(doc['url.path'].value.splitOnToken('/').length)"
  }
}}

Rules of thumb

  • Never ingest untrusted JSON into a schema-less index.
  • Reserve dynamic mapping for prototyping, not production.
  • Every field that isn't aggregated or filtered can safely be "index": false.
SharePostLinkedIn

Reader Discussion

7 replies// weighed in

TopNewestAuthor
Add to the thread
Disagree, agree harder, or share your own experience…
Email instead →markdown okbe kind
  1. Highlighted by author
    Raghav Sharma· Search Engineer · FlipkartFrom experience

    rescore vs raw function_score is the single biggest relevance win I've shipped this year. +0.08 NDCG@10 on product search, zero recall regression, two weeks of work. If you have a serious search funnel and you're not rescoring, you're leaving money on the table.

    Sep 10, 2025·2 days later
  2. Rashida Hassan· ML EngAsks

    would love a follow-up on hybrid bm25 + dense vector search in 8.x. we A/B'd it last quarter and the BM25 head still wins on long-tail queries by a surprising margin.

    Sep 15, 2025·1 week later
  3. Olivia Bennett· Data EngineerStory

    Mapping explosion off untrusted JSON cost us a 2-billion-doc reindex last year. Six engineers, two weekends, one director apology email. "Never ingest untrusted JSON" should be on the office wall in 72pt.

    Sep 12, 2025·4 days later
  4. Michiel de Vries· Observability LeadAgrees

    ILM + rollover turned our cluster from a 4-times-a-week pager generator into something on-call literally forgets exists for months at a time. If a post wants one takeaway it should be this.

    Sep 11, 2025·3 days later
  5. Thành Võ· BackendPushback

    tiny correction — bm25 k1 is term frequency saturation, không phải boost weight như nhiều bài blog ES nhầm. b mới là length normalization. nói rõ thì người đọc tune đỡ sai.

    Sep 16, 2025·1 week later·edited
  6. Rachel Gold· Staff SREAgrees

    the on-call framing throughout this piece is what makes it land. too many infra articles assume you never get paged. those are written by people who never got paged.

    Sep 11, 2025·3 days later
  7. Omar Khalil· Senior SWEKind words

    this is the third article from this blog I've sent to my team this month. you're cooking. don't switch to crypto.

    Sep 13, 2025·5 days later

Worked on something similar? Email ducminhldm@gmail.com — I read every one. The good ones become future posts.

Comments seeded · live discussion via email