Scoring & Relevance: BM25, Boosting, and Rescoring That Works

Default BM25 is a strong baseline. Here's how to tune it, combine it with business signals, and not destroy recall.

September 30, 20259 min readElasticsearchSearch

Elasticsearch switched from TF-IDF to BM25 in 5.0. BM25 is a better default because it diminishes the return on term frequency — a document that mentions a word 50 times isn't 50× more relevant than one that mentions it once.

The BM25 formula in one line

score = IDF · tf · (k1 + 1) / (tf + k1 · (1 − b + b · dl / avgdl))

Two knobs you can actually tune:

k1 (default 1.2): how quickly TF saturates. Higher → longer documents win more.
b (default 0.75): length normalisation. Lower (e.g. 0.3) is better for short fields like titles.

Per-field similarity

"settings": {
  "index": {
    "similarity": {
      "title_bm25": { "type": "BM25", "k1": 0.9, "b": 0.3 }
    }
  }
},
"mappings": {
  "properties": {
    "title": { "type": "text", "similarity": "title_bm25" }
  }
}

Combining signals without breaking relevance

Business wants "boost recent," "boost popular," "boost our paying tenants." The worst way is to sum everything into a single function_score with hand-tuned multipliers — relevance collapses as the weights drift.

The right pattern is rescore: let BM25 find a candidate set, then re-rank the top N with expensive signals.

{
  "query": { "match": { "body": "observability stack" } },
  "rescore": {
    "window_size": 100,
    "query": {
      "rescore_query": {
        "function_score": {
          "functions": [
            { "gauss": { "published_at": { "origin": "now", "scale": "30d" } } },
            { "field_value_factor": { "field": "clicks", "modifier": "log1p", "factor": 0.2 } }
          ],
          "score_mode": "sum",
          "boost_mode": "sum"
        }
      },
      "query_weight": 0.7,
      "rescore_query_weight": 0.3
    }
  }
}

Measuring, not guessing

Use the Ranking Evaluation API with a set of judged queries. Track NDCG@10 over deploys. Anything you ship without a scoreboard will drift.

When BM25 isn't enough

For semantic recall (synonyms, paraphrase, multilingual), add a dense vector field and use hybrid search: BM25 + kNN, combined with Reciprocal Rank Fusion. That's a separate article — but the point is: don't try to encode meaning with keyword boosts alone.

Scoring & Relevance: BM25, Boosting, and Rescoring That Works

The BM25 formula in one line

Per-field similarity

Combining signals without breaking relevance

Measuring, not guessing

When BM25 isn't enough

6 replies// weighed in

More from this topic

Inverted Index 101: How Elasticsearch Actually Finds Things

Shard Sizing: The Single Most Important Elasticsearch Decision

Mapping Explosion: Why Your Cluster Is Eating RAM for Breakfast