Rate Limiting: Token Bucket, Leaky Bucket, and Sliding Window

Every public API needs a rate limiter, but the algorithm you pick changes the burst behaviour your users feel. A practical comparison with Redis implementations.

May 14, 20269 min readSystem DesignScaling

Rate limiting sounds like one feature, but "limit to 100 requests per minute" can mean four different things depending on the algorithm. The differences show up at the edges — bursts, boundaries, and fairness — which is exactly where production traffic lives.

1. Fixed window

Count requests per calendar window (e.g. each minute). Simple, but it allows a 2× burst at the boundary: 100 requests at 11:00:59 and 100 more at 11:01:00 — 200 requests in two seconds, both "within limit".

INCR rate:user123:1715000460   // key includes the window timestamp
EXPIRE rate:user123:1715000460 60
// reject if value > 100

2. Sliding window log / counter

The log variant stores a timestamp per request in a sorted set and counts entries in the trailing 60s. Exact, but memory grows with traffic. The counter variant approximates it by weighting the previous window — the standard production compromise.

// sliding window log in Redis
ZREMRANGEBYSCORE rate:user123 0 (now-60000)
ZADD rate:user123 now now
ZCARD rate:user123        // reject if > 100
EXPIRE rate:user123 60

3. Token bucket

A bucket holds up to N tokens and refills at a steady rate. Each request takes a token; an empty bucket means reject. This is the algorithm that allows controlled bursts (up to bucket size) while bounding the sustained rate — usually what you actually want for an API.

capacity = 100, refill = 100/60 tokens per second
on request:
  refill based on elapsed time
  if tokens >= 1: tokens -= 1 → allow
  else: reject

4. Leaky bucket

Requests enter a queue that drains at a fixed rate. It smooths output to a constant rate — no bursts pass through at all. Great for protecting a fragile downstream (a payment provider), worse for interactive APIs where a small burst is fine.

5. Doing it right in a distributed system

With more than one API node, an in-memory counter per node multiplies your real limit by the node count. Centralise the counter in Redis and make the check-and-decrement atomic with a Lua script, or you'll lose increments under concurrency.

-- atomic token-bucket check in one round trip
local tokens = tonumber(redis.call('GET', KEYS[1]) or capacity)
-- refill, compare, decrement, set — all server-side

Rules of thumb

Default to token bucket for public APIs — bounded sustained rate, controlled bursts.
Use leaky bucket when the thing you're protecting can't handle bursts at all.
Never rate-limit per-node in a fleet. Centralise the state and make the update atomic.
Return 429 with a Retry-After header. Clients that can back off correctly are clients that stop hammering you.

Rate Limiting: Token Bucket, Leaky Bucket, and Sliding Window

1. Fixed window

2. Sliding window log / counter

3. Token bucket

4. Leaky bucket

5. Doing it right in a distributed system

Rules of thumb

2 replies// weighed in

More from this topic

Idempotency Keys: Making APIs Safe to Retry

The Outbox Pattern: Atomic DB and Queue Writes

The Saga Pattern: Distributed Transactions Without Two-Phase Commit