ML
Apache Kafka

Exactly-Once Semantics in Kafka: Idempotence & Transactions

How producer IDs, sequence numbers, and the transaction coordinator combine to give you exactly-once — and when they don't.

August 28, 202511 min readKafkaReliability

"Exactly-once" in Kafka isn't magic — it's a careful stack of three mechanisms: idempotent producers, transactions, and read-committed consumers. Miss any one of them and you fall back to at-least-once.

1. Idempotent producer

Enabled with enable.idempotence=true. The broker assigns a producer ID (PID) and the client attaches a monotonically increasing sequence number per partition. On retry, the broker sees a duplicate (PID, seq) and silently drops it.

Properties p = new Properties();
p.put("bootstrap.servers", "broker:9092");
p.put("acks", "all");
p.put("enable.idempotence", "true");
p.put("max.in.flight.requests.per.connection", "5");

Idempotence alone gives you exactly-once per partition per producer session. It does not cover a crash-and-restart, and it does not cover writes that span partitions.

2. Transactions

To span partitions (or to survive a restart), wrap writes in a transaction. This requires a stable transactional.id — that ID is how the transaction coordinator "recognises" you after a crash and fences off zombies.

producer.initTransactions();
try {
  producer.beginTransaction();
  producer.send(new ProducerRecord<>("orders", key, order));
  producer.send(new ProducerRecord<>("audit",  key, audit));
  producer.sendOffsetsToTransaction(offsets, consumerGroupId);
  producer.commitTransaction();
} catch (KafkaException e) {
  producer.abortTransaction();
}

The sendOffsetsToTransaction call is the piece that makes consume → transform → produce pipelines atomic — consumer offsets are committed inside the same transaction as the output records.

3. Read-committed consumer

Producer side alone is useless if consumers read aborted messages. Set isolation.level=read_committed so consumers skip records from aborted transactions and only advance past the Last Stable Offset (LSO).

Where EOS silently breaks

  • Writing to an external system (a database, an HTTP API) — the transaction does not cover that. Use the outbox pattern.
  • Mixing transactional and non-transactional writes to the same topic — consumers will see an interleaving that depends on timing.
  • Using a fresh transactional.id on every restart. You lose zombie fencing and lose EOS.

Rule of thumb

If your pipeline is Kafka → Kafka, EOS is cheap and real. If it's Kafka → anything else, design for idempotent consumers and stop calling it exactly-once.

SharePostLinkedIn

Reader Discussion

8 replies// weighed in

TopNewestAuthor
Add to the thread
Disagree, agree harder, or share your own experience…
Email instead →markdown okbe kind
  1. Highlighted by author
    Tuấn Phạm🇻🇳 HCMC· Staff Engineer · Tiki Data PlatformStory

    min.insync.replicas=2 với acks=all là combo chuẩn. Bọn em từng để mặc định min.insync.replicas=1, một broker rolling restart là mất 4 message — xui là đúng cái event payment confirm. Ngồi viết postmortem từ 2h sáng đến 7h.

    Aug 29, 2025·1 day later
    • ML
      Minh LeAuthor

      Đúng cái khoảnh khắc realize default = 1 thì đã muộn. Cảm ơn bro share — mình sẽ thêm warning box vô post.

      Aug 29, 2025
  2. Mai Tran· Full StackAsks

    real q: composite key = customerId + (ts % 8) sounds clean but what's the play when one customer goes whale-mode and one of the 8 buckets gets super hot retroactively? do you re-key in a side topic or just suffer until the next quarter?

    Aug 31, 2025·3 days later
    • Derek Okonkwo· Principal Engineer

      We do a side compaction topic keyed by (customer, partition). Cheap to scan, cheap to re-route. The sin is trying to live-migrate keys.

      Sep 01, 2025
  3. Derek Okonkwo· Principal Engineer · FintechAgrees

    The Kafka→anything-else caveat is the single most important paragraph in this post. I cannot count how many "exactly once" pipelines I've reviewed that write to Postgres without an outbox and the team still calls it EOS. It's at-least-once with extra steps and a worse mental model.

    Aug 29, 2025·1 day later
  4. Sven Bergström· Senior SWEStory

    fresh transactional.id on each pod restart is THE silent killer. We had it for ~14 months. Realised when a zombie producer wrote 80k duplicates after a node went catatonic and came back.

    Sep 02, 2025·5 days later
  5. Quốc Anh· Backend Lead · FinhayAgrees

    Đoạn segment files giải thích quá ngắn gọn. Bọn em hay quên log.segment.bytes vs log.retention.bytes là 2 thằng khác nhau, bị retention không kick in là vì segment chưa rolled — đúng cái rule of thumb cuối bài.

    Aug 30, 2025·2 days later
  6. Léa Dubois· SREAsks

    any chance you'd publish these as a PDF collection? would love to print and read offline on flights. screen-fatigue is real.

    Sep 03, 2025·6 days later

Worked on something similar? Email ducminhldm@gmail.com — I read every one. The good ones become future posts.

Comments seeded · live discussion via email