February 19, 2026

kafkadistributed-systemsmessagingconsumers

Kafka Consumers & Consumer Groups — Pull Model, Partition Assignment, and Rebalancing

The Kafka producer story is about speed. The consumer story is about coordination — how multiple processes divide up a stream of data, keep track of where they are, and recover cleanly when something fails.

This post covers the consumer's design from the ground up: why Kafka chose a pull model, how consumer groups work, the partition assignment strategies available, what happens during a rebalance, and how offsets are managed.

The Pull Model

Most messaging systems are push-based: the broker delivers messages to consumers as soon as they arrive. Kafka is pull-based: consumers ask the broker for the next batch of records.

This is a deliberate choice with three main consequences:

1. Consumers control their own pace. A slow consumer does not build up a backlog at the broker's output buffer or get overwhelmed by a burst. It simply reads less frequently. The broker's job is purely storage.

2. Batching is natural. Because the consumer polls, it can ask for up to max.partition.fetch.bytes of data in one request. With a push model, the broker must decide how much to send per push.

3. The broker is stateless about consumers. The broker stores data in its log segments. The consumer tracks its own position (offset). If a consumer dies, no broker-side state is lost — another consumer can pick up from the same offset.

The downside is a latency floor: if there are no new records, the consumer must poll again after a timeout. Kafka mitigates this with long polling — if no records are available, the broker holds the fetch request open for up to fetch.max.wait.ms (default 500 ms) before responding with an empty response.

Consumer Groups

A consumer group is a set of consumers that collectively consume a topic. Kafka guarantees that each partition is assigned to exactly one consumer in the group at any given time.

Topic: orders (6 partitions)
Consumer Group: order-processor (3 consumers)

Consumer A → partition 0, partition 1
Consumer B → partition 2, partition 3
Consumer C → partition 4, partition 5

This model gives you horizontal scaling: add consumers to the group to increase parallelism, up to the number of partitions. Beyond that, extra consumers sit idle — there are no partitions left to assign.

If you have two independent applications that both need to consume the same topic, give them different group.id values. Each group maintains its own offset — they read the same data independently.

group.id=order-processor     ← reads all 6 partitions
group.id=analytics-pipeline  ← also reads all 6 partitions, independently

The Group Coordinator

Every consumer group is managed by a broker called the group coordinator — determined by hashing the group.id to a partition of the internal __consumer_offsets topic.

The coordinator tracks which consumers are alive (via heartbeats) and triggers rebalances when membership changes. Consumers send heartbeats on a background thread:

heartbeat.interval.ms=3000   ← how often to send a heartbeat (default 3s)
session.timeout.ms=45000     ← coordinator declares a consumer dead after this (default 45s)

If a consumer stops sending heartbeats (crash, GC pause, overloaded), the coordinator waits for session.timeout.ms and then triggers a rebalance.

Partition Assignment Strategies

When a rebalance happens, the group leader (the first consumer to join) runs a partition assignment algorithm and sends the result back to the coordinator.

Range (`RangeAssignor`)

Partitions are sorted and divided evenly across consumers. For topics that don't divide evenly, the first few consumers get an extra partition.

Topic A: [0,1,2,3,4,5]  3 consumers
Consumer 0 → [0, 1]
Consumer 1 → [2, 3]
Consumer 2 → [4, 5]

Advantage: partitions from related topics tend to end up on the same consumer (useful for joins). Disadvantage: uneven distribution when the number of partitions doesn't divide evenly.

Round Robin (`RoundRobinAssignor`)

All partitions across all subscribed topics are sorted and assigned in round-robin order.

Consumer 0 → [0, 3]
Consumer 1 → [1, 4]
Consumer 2 → [2, 5]

More even distribution, but partitions from related topics may land on different consumers.

Sticky (`StickyAssignor`)

Aims for maximum stability: on a rebalance, it keeps existing assignments intact and only reassigns the partitions that must move. This minimises the number of partitions that change hands.

Cooperative Sticky (`CooperativeStickyAssignor`)

The same goal as Sticky, but uses the cooperative rebalance protocol (more on this below). This is the current recommended default.

partition.assignment.strategy=org.apache.kafka.clients.consumer.CooperativeStickyAssignor

The Rebalance Protocol

When a new consumer joins or an existing one leaves, a rebalance is triggered. How rebalances work has changed significantly across Kafka versions.

Eager Rebalance (original)

All consumers stop consuming, revoke all their partitions, and wait for new assignments. This is the "stop the world" approach.

Phase 1: All consumers revoke all partitions (consumption pauses)
Phase 2: Leader runs assignment algorithm
Phase 3: Coordinator sends new assignments
Phase 4: All consumers resume

The problem: even consumers whose assignment doesn't change must stop. For a large group with frequent membership changes (e.g., rolling deployments), this creates continuous gaps in processing.

Cooperative (Incremental) Rebalance

Introduced in Kafka 2.4 via CooperativeStickyAssignor. Consumers only revoke the partitions that are being moved, not all of them.

Round 1:
  - Consumers revoke only the partitions that will be reassigned
  - Consumers that keep their partitions continue consuming

Round 2:
  - Freed partitions are assigned to their new owners
  - New owners start consuming

This means consumers continue processing during the rebalance for the partitions they keep. The disruption is proportional to the number of partitions that actually move, not the total partition count.

Static Group Membership

If you know your consumer set is stable, assign each consumer a group.instance.id. Static members do not trigger a rebalance on restart — they are given a grace period (session.timeout.ms) to reconnect and reclaim their partitions.

group.instance.id=consumer-0

This is useful for stateful consumers (stream processing with local state) where a rebalance would require expensive state migration.

Offset Management

Kafka tracks where each consumer group is in each partition via offsets stored in the __consumer_offsets internal topic. The offset for a partition is the offset of the next record to be fetched (not the last one consumed).

Committing Offsets

Offsets are committed by the consumer, not automatically by the broker. There are two modes:

Auto-commit: the consumer commits offsets periodically on a background thread.

enable.auto.commit=true
auto.commit.interval.ms=5000   ← commit every 5 seconds

The risk: if the consumer crashes after processing a record but before the 5-second auto-commit fires, it will re-process that record after recovery. This gives at-least-once delivery at best.

Manual commit: the application calls consumer.commitSync() or consumer.commitAsync() explicitly, after processing.

while (true) {
    ConsumerRecords<K, V> records = consumer.poll(Duration.ofMillis(100));
    process(records);
    consumer.commitSync();   // blocks until broker confirms
}

commitSync() blocks until the broker acknowledges the commit. commitAsync() does not block but takes a callback for error handling. For exactly-once semantics, you need to commit offsets and write results atomically — which requires either Kafka transactions or an external store that supports atomic writes.

`auto.offset.reset`

When a consumer group has no committed offset for a partition (new group, or offsets expired), auto.offset.reset controls where it starts:

Value	Behaviour
`latest` (default)	Start from the end — only new messages
`earliest`	Start from offset 0 — replay everything
`none`	Throw an exception

The `__consumer_offsets` Topic

Committed offsets are stored in a compacted internal topic named __consumer_offsets. It has 50 partitions by default (offsets.topic.num.partitions). Each record is a key-value pair:

Key:   group_id + topic + partition
Value: offset + metadata + timestamp

Because the topic is compacted, only the latest committed offset per (group, topic, partition) is retained. Old commits are cleaned up automatically.

Fetch Tuning

The consumer's fetch behaviour is controlled by a set of parameters that balance latency against throughput:

Parameter	Default	Effect
`fetch.min.bytes`	1	Broker waits until at least this many bytes are available
`fetch.max.wait.ms`	500	Maximum time broker waits if `fetch.min.bytes` not met
`max.partition.fetch.bytes`	1 048 576 (1 MB)	Maximum bytes per partition per fetch
`max.poll.records`	500	Maximum records returned per `poll()` call
`max.poll.interval.ms`	300 000 (5 min)	If `poll()` is not called within this time, the consumer is considered dead

max.poll.interval.ms is critical for consumers that do slow processing inside the poll loop. If your processing takes longer than this, the coordinator will kick the consumer out of the group. Either increase max.poll.interval.ms or reduce max.poll.records so each batch is smaller.

# Slow processing pipeline
max.poll.records=50
max.poll.interval.ms=600000    # 10 minutes

Lag Monitoring

Consumer lag — the difference between the latest offset on the broker and the consumer's committed offset — is the most important operational metric for Kafka consumers.

lag = log_end_offset - committed_offset

Lag growing over time means the consumer is not keeping up with the producer. The standard tool for checking lag is:

kafka-consumer-groups.sh \
  --bootstrap-server broker:9092 \
  --describe \
  --group my-group

For production monitoring, expose lag as a metric (JMX or a dedicated consumer lag exporter) and alert when it exceeds a threshold.

Summary

Kafka's consumer design reflects the same philosophy as the rest of the system: push complexity to the edges, keep the broker simple.

Pull model — consumers control pace; broker has no push-side state per consumer
Consumer groups — partition-level parallelism, independent groups for independent applications
Cooperative rebalance — incremental, partition-level; minimises processing gaps during group changes
Manual offset commit — explicit control over delivery semantics (at-least-once vs stronger)
__consumer_offsets — durable, compacted, broker-side storage for group progress

The main failure modes all trace back to the same root: the consumer not calling poll() fast enough, or not committing offsets at the right granularity. Get those two right and Kafka consumers are straightforward to operate.