Project Structure and Architecture — Go Package Layout for a Distributed Log

Project Structure and Architecture — Go Package Layout for a Distributed Log

Before writing any code, set up the Go project and understand how the packages are organized. This page walks through the directory layout, what each package is responsible for, and the layered architecture that ties everything together.

Step 1: Initialize the Go module

Create the project directory and initialize a Go module:

mkdir distributed-log && cd distributed-log
go mod init github.com/mohitkumar/mlog

Step 2: Create the project layout

Use these commands to create the directory structure:

mkdir -p cmd/{server,producer,consumer,topic}
mkdir -p segment log protocol transport rpc coordinator discovery topic client consumer config errs scripts infra

Each package owns one layer or concern of the system:

distributed-log/
├── cmd/
│   ├── server/main.go       # Broker server binary
│   ├── producer/main.go     # Producer CLI binary
│   ├── consumer/main.go     # Consumer CLI binary
│   └── topic/main.go        # Topic management CLI binary
├── segment/                   # Segment and index (byte encoding, mmap)
│   ├── segment.go
│   ├── index.go
│   └── test_util.go
├── log/                       # Log API (multiple segments) + LogManager (LEO/HW)
│   ├── log.go
│   └── log_manager.go
├── protocol/                  # Wire format: frames, message types, codec
│   ├── types.go
│   ├── frame.go
│   ├── codec.go
│   ├── metadata.go
│   ├── replication_batch.go
│   └── error.go
├── transport/                 # TCP transport (server and client)
│   └── transport.go
├── rpc/                       # RPC handlers (produce, consume, admin)
│   ├── server.go
│   ├── producer.go
│   ├── consumer.go
│   ├── leader.go
│   └── error.go
├── coordinator/               # Raft consensus (metadata coordination)
│   ├── coordinator.go
│   ├── coordinator_events.go
│   ├── fsm.go
│   ├── logstore.go
│   └── metadata.go
├── discovery/                 # Serf cluster membership
│   └── discovery.go
├── topic/                     # Topic management, replication, ISR
│   ├── topic.go
│   ├── topic_coordinator.go
│   └── replication.go
├── client/                    # Client libraries (producer, consumer, admin)
│   ├── bootstrap.go
│   ├── producer.go
│   ├── consumer.go
│   ├── rpc.go
│   └── reconnect.go
├── consumer/                  # Consumer offset management
│   ├── consumer.go
│   └── error.go
├── config/                    # Configuration structs
│   └── config.go
├── errs/                      # Centralized error definitions
│   └── errs.go
├── scripts/                   # Helper scripts
│   ├── start-local-cluster.sh
│   └── stop-local-cluster.sh
├── infra/                     # Docker files
│   ├── Dockerfile
│   └── docker-compose.yml
├── go.mod
├── go.sum
└── Makefile

Package responsibilities

Each package has a single, clear responsibility:

Package Layer Responsibility
segment Storage One segment: append-only .log file + sparse .idx index. Byte-level encoding of records. Memory-mapped index with binary search.
log Storage Manages multiple segments. Append to active segment, roll when full, find segment for reads. LogManager adds LEO and high watermark tracking.
protocol Network All message types (Produce, Fetch, CreateTopic, etc.), frame format (length-prefixed), JSON codec, replication batch encoding, error codes.
transport Network TCP server (accept connections, dispatch to handlers) and TCP client (dial, send request, read response). Connection management and keepalive.
rpc Network Server-side RPC handlers. Maps message types to handler functions. Implements produce, fetch, topic CRUD, and leader discovery.
coordinator Consensus Raft node lifecycle (setup, join, leave). FSM for metadata events. Adapts our log as a Raft LogStore.
discovery Cluster Serf-based cluster membership. Gossip, join/leave events, member list with tags (RPC/Raft addresses).
topic Application TopicManager: tracks topics, leaders, replicas. Handles produce/consume at the application level. Runs replication threads. Computes ISR and high watermark.
client Client Producer, consumer, and admin client libraries. Leader discovery, automatic reconnection, bootstrap logic.
consumer Application Consumer offset tracking. Persists committed offsets to a local log. Recovers offsets on restart.
config Infra Configuration structs for node, Raft, Serf, and replication settings.
errs Infra Centralized error definitions with errors.Is() support. Segment, log, topic, Raft, and protocol errors.
cmd CLI Cobra-based CLI binaries: server, producer, consumer, topic management.

Three-layer architecture

The system is organized into three layers. Each layer depends only on the layers below it:

┌─────────────────────────────────────────────────────────┐
│                    Layer 3: Cluster                     │
│                                                         │
│   discovery/    topic/    rpc/    client/    cmd/       │
│                                                         │
│   Serf membership, topic management, replication,       │
│   RPC handlers, client libraries, CLI tools             │
├─────────────────────────────────────────────────────────┤
│                    Layer 2: Consensus                   │
│                                                         │
│   coordinator/                                          │
│                                                         │
│   Raft leader election, metadata replication,           │
│   FSM for applying committed events                     │
├─────────────────────────────────────────────────────────┤
│                    Layer 1: Storage + Network           │
│                                                         │
│   segment/    log/    protocol/    transport/           │
│                                                         │
│   On-disk segments, indexes, log management,            │
│   wire format, TCP transport                            │
└─────────────────────────────────────────────────────────┘

Layer 1 — Storage + Network: Pure local operations. Segments read and write bytes to disk. The protocol defines message formats. The transport sends and receives bytes over TCP. No cluster awareness.

Layer 2 — Consensus: Raft coordinates metadata across the cluster. The coordinator uses Layer 1 (log as a Raft LogStore, transport for Raft RPCs) and provides Layer 3 with a consistent view of cluster state.

Layer 3 — Cluster: Everything that makes the system distributed. Topics are managed through Raft. Replication threads pull data between nodes. Serf discovers peers. RPC handlers serve client requests. Client libraries find leaders and reconnect on failure.

Step 3: Add Key dependencies

Add the core dependencies to your go.mod:

# Raft consensus
go get github.com/hashicorp/raft
go get github.com/hashicorp/raft-boltdb

# Serf cluster membership
go get github.com/hashicorp/serf

# Memory-mapped files (for the sparse index)
go get github.com/tysonmote/gommap

# CLI framework
go get github.com/spf13/cobra
go get github.com/spf13/viper

# Structured logging
go get go.uber.org/zap
Dependency Purpose
hashicorp/raft Raft consensus: leader election, log replication, FSM interface
hashicorp/raft-boltdb BoltDB-backed stable store for Raft's term and vote state
hashicorp/serf Gossip-based cluster membership and failure detection
tysonmote/gommap Memory-mapped file I/O for the sparse index
spf13/cobra CLI command framework for server, producer, consumer, topic binaries
spf13/viper Configuration management (flags, env vars)
go.uber.org/zap Structured, leveled logging

Data flow through the layers

Here is how a produce request flows through the system:

Producer CLI (cmd/producer)
    │
    ▼
ProducerClient (client/producer.go)
    │  Discovers leader via FindTopicLeader
    │  Sends ProduceRequest over TCP
    ▼
Transport (transport/transport.go)
    │  Decodes frame, routes to handler
    ▼
RPC Server (rpc/producer.go)
    │  Validates request, calls TopicManager
    ▼
TopicManager (topic/topic.go)
    │  Checks this node is leader
    │  Calls LogManager.Append()
    │  If AckAll: waits for replicas
    ▼
LogManager (log/log_manager.go)
    │  Tracks LEO, delegates to Log
    ▼
Log (log/log.go)
    │  Appends to active segment
    │  Rolls segment if full
    ▼
Segment (segment/segment.go)
    │  Encodes record: [Offset][Len][Value]
    │  Writes to .log file
    │  Updates sparse index
    ▼
Disk (.log and .idx files)

And for replication:

Follower node
    │
ReplicationThread (topic/replication.go)
    │  Wakes every second
    │  For each topic this node replicates:
    ▼
ConsumerClient → Leader node (client/consumer.go)
    │  FetchBatch(topic, startOffset) with ReplicaNodeID
    ▼
Leader's RPC Server (rpc/consumer.go)
    │  Reads using ReadUncommitted (up to LEO)
    │  Records follower's LEO for ISR computation
    ▼
Follower applies records to local log
    │  TopicManager.ApplyRecord()
    ▼
Leader computes ISR, advances HW

Build order

Build the system bottom-up in this order:

  1. Segment + Index — byte-level record encoding, sparse index with mmap, append and read.
  2. Log + LogManager — multiple segments, rotation, LEO and high watermark tracking.
  3. Protocol — frame format, message types, codec, replication batch format.
  4. Transport — TCP server and client, connection management.
  5. RPC — handler registration, produce/consume/admin handlers.
  6. Discovery — Serf setup, join/leave event handling.
  7. Coordinator — Raft setup, FSM, LogStore adapter, metadata events.
  8. Topic — TopicManager, topic lifecycle, replication thread, ISR/HW logic.
  9. Client — producer, consumer, admin clients with leader discovery.
  10. CLI — server, producer, consumer, topic management commands.

Each step builds on the previous one. You can test each layer independently before moving to the next.

Next steps

The next page builds the low-level log API: record encoding, segments, the sparse index, and the Log/LogManager types that wrap it all together.