When Elastic Refused to Search: The Mapping Bomb Diary
A vendor pushed 47 million docs of untyped JSON into our index. Field count exploded past 12,000. Here's how we crawled out, day by day.
This one didn't happen overnight. It happened over nine days, like a slow-motion car crash you watch through binoculars while sipping coffee. By the end I had a postmortem doc, an internal training, and a permanent allergy to the words schemaless and just ingest the JSON.
Day 0: "Just point it at the new vendor feed"
The product team had inked a deal with a third-party data vendor. They'd send us product enrichment data — reviews, sentiment, related-product graphs — as JSON over an SQS queue. We'd stuff it into Elastic, surface it in search, and call it a feature.
I asked twice for a schema. I got back: "It's just JSON, you can index it as-is." Famous last words.
Day 1: First 200k docs in
Looked fine. Search worked. Cluster green. Two of the analytics team did a happy dance.
What I didn't notice: the field count on the new index had hit 870 in the first day. Default index.mapping.total_fields.limit in our cluster was 1,000. We had budgeted for 200, max.
Day 2: "Why is everyone's search slow?"
Latency on the main product index — a different index — was up. Not catastrophic, but clearly worse. Cluster CPU was spiked across all data nodes. The new ingest had pushed our hot tier into memory pressure.
Each new field allocation in Elastic is not free. The field data structure, the term dictionary, the doc-values columns — they all cost RAM. With 870 fields, our heap usage on the hot nodes had jumped 4 GB. Garbage collection was working overtime.
Day 3: The mapping limit hits
The vendor had been adding fields organically. Day 3, an ingest batch added field number 1,001. The cluster started rejecting docs:
"type": "illegal_argument_exception",
"reason": "Limit of total fields [1000] in index
[vendor_enrichment] has been exceeded"
Two paths: bump the limit, or fix the ingest. I bumped the limit to 5,000 "as a temporary measure." Reader, it was not temporary.
Day 4-5: The weekend of horror
By Sunday afternoon the field count was 7,200. The data was being indexed because some review documents had nested JSON like:
{
"review_id": "abc",
"metadata": {
"scrape_2025_11_18_14_22_run_4": "ok",
"internal_tag_xj3p9": "true",
"vendor_session_a8f0e21": "expired"
}
}
The vendor was, charmingly, embedding their internal scrape session IDs as JSON keys. Each new scrape ran created roughly 30 new fields. Multiply by their batch size and we were looking at thousands of fields per day, forever.
This is the textbook mapping explosion. The fix: do not let the source dictate your schema.
Day 6: The decision
I had two real options.
Option A: Reindex with a strict schema. Take only the fields we cared about (review text, sentiment score, product ID, timestamp). Discard the rest, or stuff them into a flattened field type that doesn't blow up the mapping. Reindex 47M docs. ETA: 30 hours.
Option B: Use dynamic templates to coerce. Cleverer. Set up a dynamic template that mapped any field starting with scrape_ or vendor_session_ as type: keyword, doc_values: false, index: false. Costs disk, saves heap. ETA: 2 hours.
I went with B for the immediate bleeding, and lined up A as the proper fix.
"dynamic_templates": [
{
"scrape_metadata": {
"match": "metadata.scrape_*",
"mapping": { "type": "keyword", "index": false, "doc_values": false }
}
},
{
"no_index_internal": {
"match": "metadata.internal_*",
"mapping": { "type": "object", "enabled": false }
}
}
]
Heap usage dropped within an hour. Field count growth slowed but didn't stop — there were other classes of garbage I hadn't templated for.
Day 7-8: The reindex
I built a new index with a strict mapping — 73 hand-picked fields, all the noise mapped to a single flattened type called raw_metadata. flattened is a beautiful escape hatch: it indexes nested JSON as a single field, you can still query individual sub-keys, but Elastic doesn't track them in the mapping. It's the right tool when the source schema is enemy territory.
Reindex took 33 hours. Failover at 04:30 ICT on day 8. Aliases swung. Cluster CPU dropped to baseline. We were back.
Day 9: The rules I added to the team handbook
- Never ingest untrusted JSON without an explicit schema. If the vendor refuses, transform at the edge. The cost of the transformation is always less than the cost of a mapping explosion.
- Set a strict total_fields.limit. Default 1,000 is generous. We run 500 in production now. Hitting the limit fails fast, which is what we want.
- Use
flattenedfor variable-shape blobs. If you genuinely don't know what's in the data,flattenedis the type you want. You give up some search granularity. You keep your cluster. - Watch field count as a first-class metric. Right alongside heap, CPU, and indexing latency. We page when an index crosses 80% of its mapping limit.
- The vendor relationship is a technical contract. The data they send is the contract. We now require a schema doc before we'll ingest. Sales hates this. Engineering loves it.
The thing I keep coming back to
Elastic is permissive by design. That's the feature. You can throw any document at it and it'll figure out the mapping. That permissiveness is also what makes it dangerous — it lets a poorly-controlled upstream wreak havoc on a perfectly-controlled cluster. The discipline isn't on Elastic; it's on you.
The single sentence I now write at the top of every Elastic onboarding doc: your schema is whatever your noisiest producer wants it to be, unless you take it back. Take it back.