Consumer Group Rebalancing: Eager, Cooperative, and Static

Why your consumer hangs for 30s every deploy — and how cooperative rebalance + static membership fixes it.

September 18, 20258 min readKafkaOperations

A rebalance is the process by which partitions are re-assigned among consumers in a group. If you've seen processing latency spike whenever a pod is recycled, you've seen the default eager protocol in action.

1. Eager rebalance (the old default)

When a new member joins, everyone revokes all their partitions, the group coordinator recomputes assignments, and everyone picks up new work. During that window — which can be tens of seconds — the group processes zero records. It's called "stop-the-world" for a reason.

2. Cooperative rebalance

Enabled by setting partition.assignment.strategy to CooperativeStickyAssignor. Only partitions that need to move are revoked; everyone else keeps processing.

props.put("partition.assignment.strategy",
  "org.apache.kafka.clients.consumer.CooperativeStickyAssignor");

The trade-off: the rebalance happens in two rounds (revoke-only, then assign). Total wall time is similar, but partitions that don't move are never paused.

3. Static membership

Give each consumer a stable group.instance.id. A short restart (within session.timeout.ms) no longer triggers a rebalance at all — the returning member reclaims its partitions.

props.put("group.instance.id", "order-worker-" + podName);
props.put("session.timeout.ms", "60000");

This is the single highest-leverage change for deployments on Kubernetes. Pair it with terminationGracePeriodSeconds ≥ session timeout so rolling updates never overlap with rebalance storms.

Diagnosing a slow rebalance

kafka-consumer-groups.sh --describe shows the current members and their generation.
Watch the coordinator log for Preparing to rebalance group X in state PreparingRebalance.
If you see frequent rebalances without obvious cause, the consumer is likely taking longer than max.poll.interval.ms inside poll().

Checklist

Switch to cooperative sticky.
Set group.instance.id for every consumer.
Keep each poll() iteration well under max.poll.interval.ms, or hand work off to a worker pool.

Consumer Group Rebalancing: Eager, Cooperative, and Static

1. Eager rebalance (the old default)

2. Cooperative rebalance

3. Static membership

Diagnosing a slow rebalance

Checklist

8 replies// weighed in

More from this topic

Kafka Internals: Log Segments, Offsets & The Commit Protocol

Exactly-Once Semantics in Kafka: Idempotence & Transactions

Partitioning Strategy: Keys, Hot Partitions, and Ordering