ML
Scaling

Idempotency in APIs: The Cheap Fix for Half Your Retry Bugs

Networks drop packets. Clients retry. Without idempotency keys you charge the card twice. It's a 20-line fix for a 2 AM class of bug.

May 08, 20257 min readScalingReliability

An operation is idempotent if executing it N times has the same effect as executing it once. GET and PUT are naturally idempotent. POST is not — and that's where the trouble starts.

Why it matters

Every retry is a potential duplicate. Client timeouts, proxy timeouts, mobile-network flakes — the request may have reached the server and the response may have been lost. The client cannot tell. Without idempotency, "retry on timeout" is a bug waiting to happen.

The pattern: idempotency keys

Client generates a UUID per logical operation and sends it on every retry:

POST /v1/charges
Idempotency-Key: 7d3f1e4b-...
Content-Type: application/json

{ "amount": 1999, "currency": "usd", "source": "tok_xxx" }

Server stores a hash of the request + the response, keyed by the idempotency key. On a retry it returns the cached response instead of executing again.

Storage

Two common patterns:

  • Table row. idempotency_keys(key PK, request_hash, response_body, status, created_at). TTL via a nightly cleanup or a partial index on created_at.
  • Redis with a short TTL. SET key response NX EX 86400. Cheaper, ephemeral — fine when the client is expected to retry within hours.

The gotchas

  1. In-flight retry. If a retry arrives while the original is still processing, you must wait for the original to finish, not start a second one. A mutex on the key solves this.
  2. Key scope. Idempotency keys are scoped per endpoint + per tenant. Never reuse across unrelated operations.
  3. Same key, different body. Reject with 422 "Idempotency key reused with different parameters." Don't silently return the old response — that masks client bugs.
  4. TTL window. Long enough for any reasonable retry (24h is typical). Longer eats storage; shorter leaves gaps.

Server-side idempotency without a key

Some operations are idempotent by shape. UPDATE user SET last_login = '2026-04-24' is — you can run it twice with no harm. INSERT … ON CONFLICT DO NOTHING is. Design for this where you can; fall back to keys where you can't.

Rule

Any POST that changes money, state, or anything the user would notice as a duplicate must support an idempotency key. Everything else is a 2 AM incident waiting for a retry storm.

SharePostLinkedIn

Reader Discussion

6 replies// weighed in

TopNewestAuthor
Add to the thread
Disagree, agree harder, or share your own experience…
Email instead →markdown okbe kind
  1. Highlighted by author
    Evan Whitfield· Eng DirectorStory

    "scale vertical until it becomes awkward" — yes. We sunk SIX MONTHS into a sharding project that a $4k/month bigger box would have made unnecessary for two more years. Premature distribution is the bourgeois cousin of premature optimisation.

    May 10, 2025·2 days later
  2. Kofi Mensah· Infra EngineerAsks

    Q: idempotency-key table — keep on primary or shard it? we hit ~120M rows/month and the index is bigger than the data. partition by month + retention helps but I'm curious what others do

    May 13, 2025·5 days later
  3. Monique Laurent· Principal EngineerFrom experience

    cell-based architecture is how you sleep at night past a certain scale. blast-radius math is so much friendlier when one cell going down hurts 1/N customers instead of all of them. it's also 2x ops cost — pick your poison.

    May 14, 2025·6 days later
  4. Quốc Anh🇸🇬 SG· Cloud ArchitectAgrees

    PACELC > CAP. Once you accept that partitions aren't the only trade-off (latency-vs-consistency happens 24/7, partitions are a side quest), the whole topology debate gets clearer. Wish PACELC had CAP's branding budget.

    May 12, 2025·4 days later
  5. Grace Liu· Senior EngineerFrom experience

    idempotency keys are such a small API change for such a huge reliability win. ship them on every new POST endpoint as a default. the day you need them and don't have them is one of the worst days of your career.

    May 11, 2025·3 days later
  6. Isabella Costa· Junior EngineerKind words

    saved this. sharing at standup tomorrow — we've had exactly this problem for 2 sprints and nobody on the team had framed it this way 🙏

    May 10, 2025·2 days later

Worked on something similar? Email ducminhldm@gmail.com — I read every one. The good ones become future posts.

Comments seeded · live discussion via email