ML
System Design

Rate Limiting: Token Bucket, Leaky Bucket, and Sliding Window

Every public API needs a rate limiter, but the algorithm you pick changes the burst behaviour your users feel. A practical comparison with Redis implementations.

May 14, 20269 min readSystem DesignScaling

Rate limiting sounds like one feature, but "limit to 100 requests per minute" can mean four different things depending on the algorithm. The differences show up at the edges — bursts, boundaries, and fairness — which is exactly where production traffic lives.

1. Fixed window

Count requests per calendar window (e.g. each minute). Simple, but it allows a 2× burst at the boundary: 100 requests at 11:00:59 and 100 more at 11:01:00 — 200 requests in two seconds, both "within limit".

INCR rate:user123:1715000460   // key includes the window timestamp
EXPIRE rate:user123:1715000460 60
// reject if value > 100

2. Sliding window log / counter

The log variant stores a timestamp per request in a sorted set and counts entries in the trailing 60s. Exact, but memory grows with traffic. The counter variant approximates it by weighting the previous window — the standard production compromise.

// sliding window log in Redis
ZREMRANGEBYSCORE rate:user123 0 (now-60000)
ZADD rate:user123 now now
ZCARD rate:user123        // reject if > 100
EXPIRE rate:user123 60

3. Token bucket

A bucket holds up to N tokens and refills at a steady rate. Each request takes a token; an empty bucket means reject. This is the algorithm that allows controlled bursts (up to bucket size) while bounding the sustained rate — usually what you actually want for an API.

capacity = 100, refill = 100/60 tokens per second
on request:
  refill based on elapsed time
  if tokens >= 1: tokens -= 1 → allow
  else: reject

4. Leaky bucket

Requests enter a queue that drains at a fixed rate. It smooths output to a constant rate — no bursts pass through at all. Great for protecting a fragile downstream (a payment provider), worse for interactive APIs where a small burst is fine.

5. Doing it right in a distributed system

With more than one API node, an in-memory counter per node multiplies your real limit by the node count. Centralise the counter in Redis and make the check-and-decrement atomic with a Lua script, or you'll lose increments under concurrency.

-- atomic token-bucket check in one round trip
local tokens = tonumber(redis.call('GET', KEYS[1]) or capacity)
-- refill, compare, decrement, set — all server-side

Rules of thumb

  • Default to token bucket for public APIs — bounded sustained rate, controlled bursts.
  • Use leaky bucket when the thing you're protecting can't handle bursts at all.
  • Never rate-limit per-node in a fleet. Centralise the state and make the update atomic.
  • Return 429 with a Retry-After header. Clients that can back off correctly are clients that stop hammering you.
SharePostLinkedIn

Reader Discussion

2 replies// weighed in

TopNewestAuthor
Add to the thread
Disagree, agree harder, or share your own experience…
Email instead →markdown okbe kind
  1. Rachel Gold· Staff SREAgrees

    the on-call framing throughout this piece is what makes it land. too many infra articles assume you never get paged. those are written by people who never got paged.

    May 17, 2026·3 days later
  2. Omar Khalil· Senior SWEKind words

    this is the third article from this blog I've sent to my team this month. you're cooking. don't switch to crypto.

    May 19, 2026·5 days later

Worked on something similar? Email ducminhldm@gmail.com — I read every one. The good ones become future posts.

Comments seeded · live discussion via email