ML
Redis

Redis Persistence: RDB, AOF, and the Durability You Actually Get

Snapshots vs append-only: what each one guarantees, what each one loses, and how to combine them safely.

August 09, 20259 min readRedisReliability

Redis persistence is two mechanisms, not one. You can run with both — and usually should.

RDB: point-in-time snapshots

A child process serializes the keyspace to a compact binary file. Triggers are time-based (save 3600 1 = at least one change in the last hour) or manual (BGSAVE). Properties:

  • Tiny on disk — great for backups and replica bootstrap.
  • Recovery is fast (a single sequential read).
  • On crash, you lose everything since the last snapshot.

AOF: append-only log

Every write is appended to a text log. The appendfsync policy decides how often that log is flushed to disk:

  • always — fsync per command. Strong durability, big latency cost.
  • everysec — fsync once per second (default). You can lose up to ~1s on crash.
  • no — let the OS decide. Fast, weakest guarantee.

AOF grows without bound, so Redis periodically rewrites it with a background fork that produces a compacted version.

The combined setup

Run both. On restart, Redis loads AOF first (it has the most recent state). Use RDB for off-box backups because the file is small enough to ship.

save 3600 1
save 300 100
appendonly yes
appendfsync everysec
aof-use-rdb-preamble yes  # hybrid: RDB prefix + AOF tail

Durability in a cluster

Replicas are asynchronous by default. WAIT numreplicas timeout blocks until that many replicas acknowledge — use it for writes that must not vanish in a failover:

SET session:abc token
WAIT 2 500

Note: WAIT is best-effort. It reduces the window of data loss but does not make Redis a consensus system. If you need that, you're looking for a different database.

SharePostLinkedIn

Reader Discussion

9 replies// weighed in

TopNewestAuthor
Add to the thread
Disagree, agree harder, or share your own experience…
Email instead →markdown okbe kind
  1. Highlighted by author
    Elena Ricci· Platform Eng · Booking infraFrom experience

    XFetch quietly killed our daily cache stampede. 6h TTL on a product catalog, three-instance API, used to brown-out for 90 seconds every refresh. Shipped XFetch on a Friday afternoon and forgot it existed. That's the highest praise I can give a fix.

    Aug 11, 2025·2 days later
  2. Sơn Nguyễn🇻🇳 Hà Nội· Senior BackendStory

    +1 cho UNLINK. FLUSHDB SYNC làm prod đứng 39s, alert pager kêu vang nhà — sau đó mình đổi qua UNLINK + SCAN chunked, không bao giờ thấy spike lại. Mọi dev junior team mình bắt buộc đọc cái incident này.

    Aug 14, 2025·5 days later
  3. Luca Bianchi· Tech LeadPushback

    fwiw — hash tags bị overuse là footgun thật. Cluster bọn mình từng có 1 slot ăn 41% traffic vì ai đó nghĩ {tenant} làm key prefix là idea hay. Cluster slowlog từ 12ms lên 800ms trong 1 đêm. Cluster rebalance không cứu được vì cùng 1 slot.

    Aug 15, 2025·6 days later·edited
  4. Huyền Lê· Software EngineerAgrees

    viết postmortem tiêu đề 'WAIT did not wait' xong 1 tuần sau gặp đoạn này trong post. cười ra nước mắt. cái phần WAIT không phải consensus primitive cần tô đỏ trong docs official.

    Aug 12, 2025·3 days later
  5. Amir Shah· InfraAsks

    Q: pre-warm hot keys — internal cron inside the app vs external scheduler (k8s cronjob etc)? We've shipped both. Internal is simpler but you fight clock skew across replicas; external is reliable but adds a moving piece.

    Aug 13, 2025·4 days later
    • ML
      Minh LeAuthor

      External, every time. The number of "why is the warmup not running" tickets I've seen with internal crons is not funny anymore. Make it boring infra.

      Aug 14, 2025
    • Carla Pérez· Backend

      we do external + a redis lock so only one instance actually runs the warmup. simple and observable.

      Aug 15, 2025
  6. Rachel Gold· Staff SREAgrees

    the on-call framing throughout this piece is what makes it land. too many infra articles assume you never get paged. those are written by people who never got paged.

    Aug 12, 2025·3 days later
  7. Omar Khalil· Senior SWEKind words

    this is the third article from this blog I've sent to my team this month. you're cooking. don't switch to crypto.

    Aug 14, 2025·5 days later

Worked on something similar? Email ducminhldm@gmail.com — I read every one. The good ones become future posts.

Comments seeded · live discussion via email