ML
Kibana

Kibana Alerting: Rules, Connectors, and the Noise Problem

Alerts are only useful if they wake people up for real problems. Here's how to build that with Kibana's rules engine.

September 16, 20258 min readKibanaObservability

Kibana Alerting (the rules engine formerly called Watcher-in-the-UI) is two pieces: rules that evaluate on a schedule and connectors that deliver actions. Every post-incident "why did we miss that?" meeting comes down to one of the two.

Rule types you will actually use

  • Elasticsearch query — threshold over a KQL/Lucene search. 90% of alerts.
  • Index threshold — an aggregation-shaped version: "avg(latency) > 500 over 5m."
  • Metric threshold / Inventory — for Metrics/Infrastructure apps.
  • Log threshold — pattern matching on log fields with group-by.

The three knobs that kill noise

  1. Look-back vs check-every. Check every minute over a 5-minute window. Don't check every 5 minutes — you'll miss 1-minute spikes.
  2. Group by high-cardinality dimensions. An alert "API latency > 500ms" fires every time one endpoint regresses. Group by service.name and endpoint so you get one page per root cause.
  3. Throttle and recovery. Set an action frequency (notify on status change) and a recovery action. Otherwise a flapping metric spams chat.

Connector discipline

Keep two connectors per destination: on-call (PagerDuty, Opsgenie) and fyi (Slack channel). Route severity-1 rules to on-call and everything else to fyi. A single connector for all alerts is how alert fatigue gets entrenched.

Rule as code

The UI is fine for the first dozen rules. Past that, export and commit:

GET /api/alerting/rule/_find?per_page=1000
# save to repo, diff on PR, re-import with the import API

Rules are saved objects. Treat them like code.

Checklist before you ship an alert

  • Is there a runbook link in the alert body?
  • Does it include the query that triggered it?
  • Does a test documentation create one alert, not 50?
  • Is there a recovery notification?
SharePostLinkedIn

Reader Discussion

7 replies// weighed in

TopNewestAuthor
Add to the thread
Disagree, agree harder, or share your own experience…
Email instead →markdown okbe kind
  1. Takeshi Mori· PlatformAgrees

    the 'space membership + role + ES privilege' three-way alignment is the source of every "why does this dashboard return zero hits" ticket I've ever closed. should be a chart in onboarding.

    Sep 21, 2025·5 days later
  2. Evi Papadopoulou· Tech LeadFrom experience

    canvas is great until someone makes a 47-element dashboard and the page hangs for 8s. happy mediums exist somewhere between "4 KPI tiles" and "art project". still love it for exec readouts though.

    Sep 22, 2025·6 days later
  3. Clara Jensen· SREAgrees

    splitting on-call vs fyi connectors is one of those changes that pays for itself in two weeks and you spend the next two years wondering how you ever lived without it. people stopped muting #alerts. that's the metric.

    Sep 18, 2025·2 days later
  4. Minh Vũ🇻🇳 Đà Nẵng· DevOpsFrom experience

    rules-as-code đúng. team mình từng có 247 rules, không ai biết ai tạo, ai sửa, half of them duplicate. export ra git, code review, blame là biết ai phá. fix overnight, kg ai phàn nàn.

    Sep 19, 2025·3 days later
  5. Hannah Kaur· Product EngAgrees

    tbh TSVB is powerful but the learning curve filters out non-SREs. Lens is the right default for cross-team dashboards in 2026. Hot take: TSVB should have been deprecated by now.

    Sep 22, 2025·6 days later·edited
  6. Rachel Gold· Staff SREAgrees

    the on-call framing throughout this piece is what makes it land. too many infra articles assume you never get paged. those are written by people who never got paged.

    Sep 19, 2025·3 days later
  7. Omar Khalil· Senior SWEKind words

    this is the third article from this blog I've sent to my team this month. you're cooking. don't switch to crypto.

    Sep 21, 2025·5 days later

Worked on something similar? Email ducminhldm@gmail.com — I read every one. The good ones become future posts.

Comments seeded · live discussion via email