ML
Elasticsearch

Scoring & Relevance: BM25, Boosting, and Rescoring That Works

Default BM25 is a strong baseline. Here's how to tune it, combine it with business signals, and not destroy recall.

September 30, 20259 min readElasticsearchSearch

Elasticsearch switched from TF-IDF to BM25 in 5.0. BM25 is a better default because it diminishes the return on term frequency — a document that mentions a word 50 times isn't 50× more relevant than one that mentions it once.

The BM25 formula in one line

score = IDF · tf · (k1 + 1) / (tf + k1 · (1 − b + b · dl / avgdl))

Two knobs you can actually tune:

  • k1 (default 1.2): how quickly TF saturates. Higher → longer documents win more.
  • b (default 0.75): length normalisation. Lower (e.g. 0.3) is better for short fields like titles.

Per-field similarity

"settings": {
  "index": {
    "similarity": {
      "title_bm25": { "type": "BM25", "k1": 0.9, "b": 0.3 }
    }
  }
},
"mappings": {
  "properties": {
    "title": { "type": "text", "similarity": "title_bm25" }
  }
}

Combining signals without breaking relevance

Business wants "boost recent," "boost popular," "boost our paying tenants." The worst way is to sum everything into a single function_score with hand-tuned multipliers — relevance collapses as the weights drift.

The right pattern is rescore: let BM25 find a candidate set, then re-rank the top N with expensive signals.

{
  "query": { "match": { "body": "observability stack" } },
  "rescore": {
    "window_size": 100,
    "query": {
      "rescore_query": {
        "function_score": {
          "functions": [
            { "gauss": { "published_at": { "origin": "now", "scale": "30d" } } },
            { "field_value_factor": { "field": "clicks", "modifier": "log1p", "factor": 0.2 } }
          ],
          "score_mode": "sum",
          "boost_mode": "sum"
        }
      },
      "query_weight": 0.7,
      "rescore_query_weight": 0.3
    }
  }
}

Measuring, not guessing

Use the Ranking Evaluation API with a set of judged queries. Track NDCG@10 over deploys. Anything you ship without a scoreboard will drift.

When BM25 isn't enough

For semantic recall (synonyms, paraphrase, multilingual), add a dense vector field and use hybrid search: BM25 + kNN, combined with Reciprocal Rank Fusion. That's a separate article — but the point is: don't try to encode meaning with keyword boosts alone.

SharePostLinkedIn

Reader Discussion

6 replies// weighed in

TopNewestAuthor
Add to the thread
Disagree, agree harder, or share your own experience…
Email instead →markdown okbe kind
  1. Highlighted by author
    Raghav Sharma· Search Engineer · FlipkartFrom experience

    rescore vs raw function_score is the single biggest relevance win I've shipped this year. +0.08 NDCG@10 on product search, zero recall regression, two weeks of work. If you have a serious search funnel and you're not rescoring, you're leaving money on the table.

    Oct 02, 2025·2 days later
  2. Olivia Bennett· Data EngineerStory

    Mapping explosion off untrusted JSON cost us a 2-billion-doc reindex last year. Six engineers, two weekends, one director apology email. "Never ingest untrusted JSON" should be on the office wall in 72pt.

    Oct 04, 2025·4 days later
  3. Michiel de Vries· Observability LeadAgrees

    ILM + rollover turned our cluster from a 4-times-a-week pager generator into something on-call literally forgets exists for months at a time. If a post wants one takeaway it should be this.

    Oct 03, 2025·3 days later
  4. Thành Võ· BackendPushback

    tiny correction — bm25 k1 is term frequency saturation, không phải boost weight như nhiều bài blog ES nhầm. b mới là length normalization. nói rõ thì người đọc tune đỡ sai.

    Oct 08, 2025·1 week later·edited
  5. Kenji Itō· Staff EngineerFrom experience

    Transform jobs are the single cheapest dashboard win in Elastic. We had a Kibana panel taking 4.2s to load on cold-cache; materialised the same 15-min summary into a transform index, dropped to 140ms. Three lines of YAML.

    Oct 05, 2025·5 days later
  6. Léa Dubois· SREAsks

    any chance you'd publish these as a PDF collection? would love to print and read offline on flights. screen-fatigue is real.

    Oct 06, 2025·6 days later

Worked on something similar? Email ducminhldm@gmail.com — I read every one. The good ones become future posts.

Comments seeded · live discussion via email