ML
System Design

Idempotency Keys: Making APIs Safe to Retry

Networks fail, clients retry, and one charge becomes two. Idempotency keys turn that from a postmortem into a non-event.

December 05, 20258 min readSystem DesignAPIs

If your API takes money, sends email, or creates anything with side effects, you have an idempotency problem. The client times out, retries, and now you have two charges. The fix is older than any framework — an idempotency key.

1. The contract

The client sends a unique key with the request (usually a UUID). The server, on the first request with that key, performs the operation and stores the response. On any retry with the same key, the server returns the stored response without re-running the operation.

POST /v1/charges HTTP/1.1
Idempotency-Key: 7d54b3a1-2a89-4f1f-8e1b-2a6f0f4d3c10
Content-Type: application/json

{ "amount_cents": 2500, "customer_id": "cust_42" }

2. The state machine

Every key transitions through three states. The implementation has to handle all of them:

  • New — first time we see the key. Run the operation, write the response.
  • In flight — a previous request with this key started and has not finished. Either wait or return a conflict.
  • Completed — we have the stored response. Return it.
INSERT INTO idempotency_keys (key, request_hash, status, expires_at)
VALUES ($1, $2, 'in_flight', now() + interval '24 hours')
ON CONFLICT (key) DO NOTHING
RETURNING *;

If RETURNING gives you nothing, the key already exists — read the row, branch on its status.

3. Two subtleties everyone misses

3.1 The request must match

A client could re-use the same key for a different request body. The server must detect that. Store a hash of the canonical request body alongside the key. On retry, if the hash differs, return 409 Conflict.

3.2 The work and the bookkeeping must be atomic

If you run the operation and then write "completed," a crash in between leaves the key stuck in in_flight forever. The safe shape:

  1. Insert key with status in_flight.
  2. Do the work and mark the key completed in the same database transaction (when the work is itself a database write).
  3. If the work is an external call (charge a card), use a state machine with persisted intermediate states — never an in-memory flag.

4. Scope and TTL

  • Scope — by tenant, not globally. (tenant_id, key) as the primary index.
  • TTL — 24 hours is the Stripe convention and a good default. Long enough to absorb client retry storms, short enough that the table does not grow forever.
  • Storage — Redis is fine if you can tolerate a tiny window of loss; Postgres is safer for money.

5. The client side

Generate the key before the first send, not on retry. Persist it across retries. A common bug: the client retries with a brand-new key each time, defeating the entire system. SDK helpers usually own this responsibility for you.

6. What this does NOT solve

Idempotency keys protect the API surface. They do not make your downstream calls idempotent. If the charge succeeds at the payment provider but the response is lost, you still need to reconcile against the provider before responding. Idempotency keys make the second request a no-op; reconciliation makes the first one observable.

Rules of thumb

  • Every mutating endpoint that touches money, sends a message, or creates a resource should accept an idempotency key.
  • Read endpoints don't need it — GETs are idempotent by definition.
  • If you cannot give clients atomic "operation + bookkeeping," you have not solved the problem, you have moved it.
SharePostLinkedIn

Reader Discussion

1 replies// weighed in

TopNewestAuthor
Add to the thread
Disagree, agree harder, or share your own experience…
Email instead →markdown okbe kind
  1. Rachel Gold· Staff SREAgrees

    the on-call framing throughout this piece is what makes it land. too many infra articles assume you never get paged. those are written by people who never got paged.

    Dec 08, 2025·3 days later

Worked on something similar? Email ducminhldm@gmail.com — I read every one. The good ones become future posts.

Comments seeded · live discussion via email