Idempotency Keys: Making APIs Safe to Retry
Networks fail, clients retry, and one charge becomes two. Idempotency keys turn that from a postmortem into a non-event.
If your API takes money, sends email, or creates anything with side effects, you have an idempotency problem. The client times out, retries, and now you have two charges. The fix is older than any framework — an idempotency key.
1. The contract
The client sends a unique key with the request (usually a UUID). The server, on the first request with that key, performs the operation and stores the response. On any retry with the same key, the server returns the stored response without re-running the operation.
POST /v1/charges HTTP/1.1
Idempotency-Key: 7d54b3a1-2a89-4f1f-8e1b-2a6f0f4d3c10
Content-Type: application/json
{ "amount_cents": 2500, "customer_id": "cust_42" }
2. The state machine
Every key transitions through three states. The implementation has to handle all of them:
- New — first time we see the key. Run the operation, write the response.
- In flight — a previous request with this key started and has not finished. Either wait or return a conflict.
- Completed — we have the stored response. Return it.
INSERT INTO idempotency_keys (key, request_hash, status, expires_at)
VALUES ($1, $2, 'in_flight', now() + interval '24 hours')
ON CONFLICT (key) DO NOTHING
RETURNING *;
If RETURNING gives you nothing, the key already exists — read the row, branch on its status.
3. Two subtleties everyone misses
3.1 The request must match
A client could re-use the same key for a different request body. The server must detect that. Store a hash of the canonical request body alongside the key. On retry, if the hash differs, return 409 Conflict.
3.2 The work and the bookkeeping must be atomic
If you run the operation and then write "completed," a crash in between leaves the key stuck in in_flight forever. The safe shape:
- Insert key with status
in_flight. - Do the work and mark the key
completedin the same database transaction (when the work is itself a database write). - If the work is an external call (charge a card), use a state machine with persisted intermediate states — never an in-memory flag.
4. Scope and TTL
- Scope — by tenant, not globally.
(tenant_id, key)as the primary index. - TTL — 24 hours is the Stripe convention and a good default. Long enough to absorb client retry storms, short enough that the table does not grow forever.
- Storage — Redis is fine if you can tolerate a tiny window of loss; Postgres is safer for money.
5. The client side
Generate the key before the first send, not on retry. Persist it across retries. A common bug: the client retries with a brand-new key each time, defeating the entire system. SDK helpers usually own this responsibility for you.
6. What this does NOT solve
Idempotency keys protect the API surface. They do not make your downstream calls idempotent. If the charge succeeds at the payment provider but the response is lost, you still need to reconcile against the provider before responding. Idempotency keys make the second request a no-op; reconciliation makes the first one observable.
Rules of thumb
- Every mutating endpoint that touches money, sends a message, or creates a resource should accept an idempotency key.
- Read endpoints don't need it — GETs are idempotent by definition.
- If you cannot give clients atomic "operation + bookkeeping," you have not solved the problem, you have moved it.