January 8, 2026

kafkadistributed-systemsmessagingperformance

How the Kafka Producer Works — And Why It's So Fast

Kafka moves hundreds of millions of messages per second on commodity hardware. Most engineers interact with it through a high-level client and take the throughput for granted. But the producer's speed is the result of a set of deliberate, layered design decisions — and understanding them makes you a much better operator.

This post covers how the Kafka producer works from the inside out: the record lifecycle, batching, compression, filesystem tricks, acknowledgement semantics, and the tuning parameters worth knowing.

The Record Lifecycle

When you call producer.send(record), the record does not immediately go to the network. Instead, it passes through several stages:

send()
  │
  ▼
Serialiser  (key + value → bytes)
  │
  ▼
Partitioner  (choose partition)
  │
  ▼
RecordAccumulator  (batched per partition leader)
  │
  ▼
Sender thread  (I/O loop, runs in background)
  │
  ▼
Kafka broker

The RecordAccumulator is the core of the producer's performance story. It is a per-partition buffer of ProducerBatch objects. Records pile up there until the batch is full or enough time has passed, and only then does the Sender thread flush them over the wire — all in a single request.

Why Batching Is the Biggest Win

Every network round-trip carries fixed overhead: TCP framing, TLS handshake amortisation, broker-side parsing. If you send records one at a time, you pay that cost every single time.

Batching amortises the cost across N records. More importantly, it enables compression to work effectively — compressors achieve much higher ratios when they have more data to find patterns in.

Two parameters control when a batch is flushed:

Parameter	Default	Effect
`batch.size`	16 384 bytes (16 KB)	Flush when the batch reaches this size
`linger.ms`	0 ms	Wait this long before flushing an under-full batch

With the defaults (linger.ms=0) the producer flushes almost immediately, which maximises latency but minimises throughput. For high-throughput pipelines, setting linger.ms=5 or linger.ms=20 and batch.size=65536 (64 KB) or larger gives the compressor enough to work with and dramatically increases records-per-request.

# High-throughput tuning
batch.size=65536
linger.ms=20

The trade-off is a small increase in end-to-end latency. If you are building a real-time pipeline where every millisecond counts, keep linger.ms low. If you are ingesting telemetry, logs, or analytics events, the latency cost is invisible and the throughput gain is significant.

Compression

Kafka supports four compression codecs: none, gzip, snappy, and lz4, plus zstd (added in 2.1).

Compression happens at the batch level, not the record level. This matters because compressors work by finding repeated byte sequences — a single 100-byte record gives them almost nothing to exploit, while a 64 KB batch of similar log lines can compress to 10–15% of its original size.

Codec Comparison

Codec	CPU cost	Compression ratio	Best for
`none`	Zero	1×	Already-compressed data (images, video)
`gzip`	High	Excellent	Cold storage, infrequent writes
`snappy`	Low	Good	General-purpose, balanced
`lz4`	Very low	Good	High-throughput, latency-sensitive
`zstd`	Medium	Excellent	Modern default; best ratio-per-CPU

For most new deployments, zstd is the right default. For legacy systems or brokers older than 2.1, lz4 is the next best option.

compression.type=zstd

Compression is transparent to consumers — they decompress automatically. The broker stores messages in their compressed form and replicates the compressed bytes, so the CPU cost of compression is paid once (by the producer) not N times (once per replica or consumer).

Why Kafka Is Fast: Filesystem Optimisation

Kafka's speed is often attributed to its use of "the filesystem", which sounds counterintuitive — disk is slow, right? The answer is that Kafka exploits two OS-level mechanisms that make the filesystem behave almost like memory.

The Page Cache

Modern operating systems keep recently accessed disk pages in RAM (the page cache). When Kafka writes a message to a log segment, the OS writes it to the page cache first — the actual disk flush is deferred. When a consumer reads that message (likely moments after it was written, for a typical stream), it is served directly from RAM, not from disk at all.

Kafka intentionally avoids keeping messages in the JVM heap. Heap memory competes with the GC; the page cache does not. By letting the OS manage caching, Kafka gets fast reads without GC pauses.

Zero-Copy Transfers (`sendfile`)

The standard path for sending a file over a network is:

disk → kernel buffer → user buffer → socket buffer → NIC

That involves two copies through user space and four context switches. Kafka instead calls sendfile() (or transferTo() in Java):

disk → kernel buffer → NIC

The data never touches user space. This halves the number of copies and eliminates context switches, letting Kafka saturate a 10 Gbps NIC with a fraction of the CPU it would otherwise need.

Sequential I/O

Disk seeks are expensive. Random writes fragment the log and slow everything down. Kafka's log is append-only: each partition is a directory of segment files, and the broker only ever writes to the end of the active segment. Sequential writes are 100–1000× faster than random writes on spinning disks and allow SSDs to perform at their rated write speed without write amplification.

Acknowledgement Modes (`acks`)

The acks parameter controls durability: it tells the producer how many broker acknowledgements it must receive before considering a write successful.

`acks=0` — Fire and forget

The producer sends the record and immediately considers it done. No confirmation is awaited.

Throughput: Maximum possible
Durability: None — if the broker crashes before writing, the record is lost
Use case: Metrics, high-volume telemetry where occasional loss is acceptable

`acks=1` — Leader acknowledgement

The producer waits for the partition leader to write the record to its local log. The leader then replies.

Throughput: High
Durability: The record survives a follower crash but not a leader crash before replication
Use case: General-purpose; the old default

`acks=all` (`acks=-1`) — Full in-sync replica acknowledgement

The leader waits until all brokers in the in-sync replica set (ISR) have written the record before replying.

Throughput: Lower — limited by the slowest ISR member
Durability: Strongest — survives the loss of all but one replica
Use case: Anything financial, transactional, or where data loss is unacceptable

acks=all

Pair acks=all with min.insync.replicas=2 on the broker/topic to ensure the leader actually has at least one follower in the ISR before accepting the write. Without this, a topic with ISR size 1 (the leader alone) will still acknowledge with acks=all, giving you false durability.

# Broker / topic config
min.insync.replicas=2
replication.factor=3

Retries and Idempotency

Retries

The producer retries failed requests automatically. The key parameter is retries (default: MAX_INT since 2.1) and retry.backoff.ms (default: 100 ms). Without care, retries can cause out-of-order delivery — a failed batch from time T₁ may succeed after a batch from T₂ has already landed.

Idempotent Producer

Enable with enable.idempotence=true. The broker assigns each producer a producer ID (PID) and a sequence number to each batch. If a duplicate arrives (due to a retry), the broker deduplicates it.

Idempotence also implicitly sets acks=all, retries=MAX_INT, and max.in.flight.requests.per.connection=5, which gives you at-least-once delivery with deduplication — effectively exactly-once delivery for a single partition.

enable.idempotence=true

For cross-partition exactly-once, you need Kafka transactions, which go beyond the producer layer.

The `max.in.flight.requests.per.connection` Parameter

The producer can pipeline multiple batches to the same broker without waiting for each to be acknowledged. max.in.flight.requests.per.connection (default: 5) controls how many outstanding requests are allowed.

Higher values improve throughput by hiding network latency. But if a request fails and is retried, later requests that already succeeded can arrive at the broker first — violating ordering.

With enable.idempotence=true, Kafka guarantees per-partition ordering even with up to 5 in-flight requests, using sequence numbers to reorder on the broker side. Without idempotence, set max.in.flight.requests.per.connection=1 if ordering matters.

Tuning Cheat Sheet

Parameter	Conservative (low latency)	Aggressive (high throughput)
`batch.size`	16 384	131 072 (128 KB)
`linger.ms`	0–5	20–100
`compression.type`	`none` / `lz4`	`zstd`
`acks`	`1`	`all`
`enable.idempotence`	`false`	`true`
`max.in.flight.requests.per.connection`	`1` (if ordering)	`5`
`buffer.memory`	32 MB	128 MB+
`send.buffer.bytes`	128 KB	1 MB

buffer.memory is the total memory the producer allocates for buffering records waiting to be sent. If this fills up, send() will block (up to max.block.ms) and then throw a TimeoutException. Increase it if you see producers blocking under heavy load.

Putting It Together

A high-throughput, durable producer configuration for a data pipeline looks like this:

# Durability
acks=all
enable.idempotence=true
min.insync.replicas=2          # set on the topic/broker

# Throughput
batch.size=131072
linger.ms=20
compression.type=zstd

# Buffering
buffer.memory=67108864         # 64 MB
max.block.ms=5000

# Connection
max.in.flight.requests.per.connection=5

A low-latency producer for a synchronous request path looks like this:

acks=1
batch.size=16384
linger.ms=0
compression.type=none
max.in.flight.requests.per.connection=1

Summary

Kafka's producer speed comes from a stack of reinforcing design choices:

Batching — amortises network and per-request overhead
Compression at batch level — turns redundancy in similar records into wire savings
Page cache — avoids JVM heap pressure and GC pauses; reads often hit RAM
sendfile / zero-copy — removes CPU cost of copying data through user space
Sequential I/O — eliminates disk seeks; sustains maximum write throughput
Pipelining — multiple in-flight batches hide network latency

The durability story is controlled separately by acks, enable.idempotence, and broker-side min.insync.replicas. Throughput and durability sit on opposite ends of a dial — understanding the trade-offs is what lets you set that dial intentionally rather than by accident.