Mapping Explosion: Why Your Cluster Is Eating RAM for Breakfast
Dynamic mapping is convenient — and the fastest way to fill your heap with fields you will never query.
Elasticsearch's dynamic mapping is a thoughtful feature and a very common foot-gun. Ship a nested JSON blob with unpredictable keys into a log index and you'll grow a field for every unique key — forever.
What a field costs
Every field lives in the cluster state. At 10,000 fields on a busy shard, master nodes and JVM heap start groaning. The default safety rail is index.mapping.total_fields.limit=1000; bumping it is a common but wrong first response to Limit of total fields [1000] has been exceeded.
The usual culprit: key-as-field
{
"metrics": {
"cpu.usage.p95.host-01": 0.42,
"cpu.usage.p95.host-02": 0.51,
"cpu.usage.p95.host-03": 0.40
}
}
This indexes a new field per host. A year of hosts = tens of thousands of fields. Rewrite to key-value arrays:
{
"metrics": [
{ "name": "cpu.usage.p95", "host": "host-01", "value": 0.42 },
{ "name": "cpu.usage.p95", "host": "host-02", "value": 0.51 }
]
}
Now you have two values per event and a fixed schema.
Defensive mapping
Turn off blind dynamic mapping. Use dynamic: "strict" for core entities and dynamic_templates for optional fields that must match a pattern:
"dynamic_templates": [
{
"string_as_keyword": {
"match_mapping_type": "string",
"mapping": { "type": "keyword", "ignore_above": 256 }
}
}
]
Runtime fields for the "I need it once" case
Need to query a field that isn't indexed? Use a runtime field — computed at query time from a Painless script. Slower per query, but no mapping cost and no reindex.
{ "runtime_mappings": {
"path_depth": { "type": "long",
"script": "emit(doc['url.path'].value.splitOnToken('/').length)"
}
}}
Rules of thumb
- Never ingest untrusted JSON into a schema-less index.
- Reserve dynamic mapping for prototyping, not production.
- Every field that isn't aggregated or filtered can safely be
"index": false.