Inverted Index 101: How Elasticsearch Actually Finds Things

From document to token to posting list — the data structure behind sub-millisecond full-text search.

July 30, 20257 min readElasticsearchFundamentals

Everyone says Elasticsearch is fast. The reason is not magic — it's a data structure called the inverted index, built incrementally by Lucene under every index you create.

From text to tokens

When you index a document, the analyzer breaks each field into tokens. For the default English analyzer, "The quick brown foxes!" becomes [quick, brown, fox] — lowercased, punctuation stripped, stemmed.

The posting list

Lucene then flips the map around. Instead of doc → words, it keeps word → docs:

fox    -> [doc1, doc7, doc42]
brown  -> [doc1, doc4, doc42, doc99]
quick  -> [doc1, doc7, doc12, doc42]

A query for quick brown fox intersects three sorted lists — cheap, regardless of corpus size.

Segments are immutable

Every refresh flushes a batch of indexed docs into a new immutable segment. A shard is a bag of segments. Lucene never edits a segment in place — deletes are tombstoned, updates are delete + re-insert. Periodic merges rewrite smaller segments into larger ones.

Consequences for you

Refresh is not free. Each refresh opens a new searchable segment. Bulk loading? Set refresh_interval=-1, load, then set it back.
Updates are expensive. Prefer immutable event-stream indexing patterns over frequent POST _update.
Force-merge with care. force_merge to 1 segment is only correct for read-only indices. Live indices recover on their own.

Mapping matters more than you think

A text field is analyzed (tokenized). A keyword field is not — it's stored verbatim and used for exact match, aggregations, and sorting. You almost always want both via a multi-field:

{
  "status": {
    "type": "text",
    "fields": { "raw": { "type": "keyword" } }
  }
}

Get the mapping wrong on day one and you'll be reindexing on day 90.

Inverted Index 101: How Elasticsearch Actually Finds Things

From text to tokens

The posting list

Segments are immutable

Consequences for you

Mapping matters more than you think

7 replies// weighed in

More from this topic

Shard Sizing: The Single Most Important Elasticsearch Decision

Mapping Explosion: Why Your Cluster Is Eating RAM for Breakfast

Scoring & Relevance: BM25, Boosting, and Rescoring That Works