Rate Limiting: Token Bucket, Leaky Bucket, and Sliding Window
Every public API needs a rate limiter, but the algorithm you pick changes the burst behaviour your users feel. A practical comparison with Redis implementations.
Rate limiting sounds like one feature, but "limit to 100 requests per minute" can mean four different things depending on the algorithm. The differences show up at the edges — bursts, boundaries, and fairness — which is exactly where production traffic lives.
1. Fixed window
Count requests per calendar window (e.g. each minute). Simple, but it allows a 2× burst at the boundary: 100 requests at 11:00:59 and 100 more at 11:01:00 — 200 requests in two seconds, both "within limit".
INCR rate:user123:1715000460 // key includes the window timestamp
EXPIRE rate:user123:1715000460 60
// reject if value > 100
2. Sliding window log / counter
The log variant stores a timestamp per request in a sorted set and counts entries in the trailing 60s. Exact, but memory grows with traffic. The counter variant approximates it by weighting the previous window — the standard production compromise.
// sliding window log in Redis
ZREMRANGEBYSCORE rate:user123 0 (now-60000)
ZADD rate:user123 now now
ZCARD rate:user123 // reject if > 100
EXPIRE rate:user123 60
3. Token bucket
A bucket holds up to N tokens and refills at a steady rate. Each request takes a token; an empty bucket means reject. This is the algorithm that allows controlled bursts (up to bucket size) while bounding the sustained rate — usually what you actually want for an API.
capacity = 100, refill = 100/60 tokens per second
on request:
refill based on elapsed time
if tokens >= 1: tokens -= 1 → allow
else: reject
4. Leaky bucket
Requests enter a queue that drains at a fixed rate. It smooths output to a constant rate — no bursts pass through at all. Great for protecting a fragile downstream (a payment provider), worse for interactive APIs where a small burst is fine.
5. Doing it right in a distributed system
With more than one API node, an in-memory counter per node multiplies your real limit by the node count. Centralise the counter in Redis and make the check-and-decrement atomic with a Lua script, or you'll lose increments under concurrency.
-- atomic token-bucket check in one round trip
local tokens = tonumber(redis.call('GET', KEYS[1]) or capacity)
-- refill, compare, decrement, set — all server-side
Rules of thumb
- Default to token bucket for public APIs — bounded sustained rate, controlled bursts.
- Use leaky bucket when the thing you're protecting can't handle bursts at all.
- Never rate-limit per-node in a fleet. Centralise the state and make the update atomic.
- Return
429with aRetry-Afterheader. Clients that can back off correctly are clients that stop hammering you.