Interview Model

Interview Model

FR → NFR → Estimates → API → Data Model → HLD → Deep Dive → Trade-offs → Failure Modes

Time split (45 min): FR+NFR (5) → Estimates (5) → API+Data Model (5) → HLD (10) → Deep Dive (15) → Trade-offs+Failures (5)

Tool: Excalidraw (or whiteboard)


Step 1 — Clarifying Questions

Never assume. Ask before drawing anything.

Category Questions
Scale DAU? Peak QPS? Data volume? Growth rate?
Access pattern Read-heavy or write-heavy? Read:write ratio?
Latency p99 SLO? Real-time or batch acceptable?
Consistency Strong or eventual? Read-your-writes needed?
Availability 99.9% or 99.999%? Single-region or global?
Durability Can we lose data? RPO/RTO requirements?
Data size Avg object size? Retention period?
Scope What to exclude? Mobile/web/both?

Step 2 — Functional Requirements (FR)

  • Write only core features — not nice-to-haves
  • Prioritize: P0 (must-have) vs P1 (if time allows)
  • Example (URL shortener):
    • P0: shorten URL, redirect to original
    • P1: analytics, expiry, custom alias

Step 3 — Non-Functional Requirements (NFR)

NFR Typical Target Notes
Availability 99.99% Multi-region for 99.999%
Read latency < 100ms p99 Agree on specific SLO
Write latency < 500ms p99 Or async (202 Accepted)
Throughput X writes/s, Y reads/s Derived from estimates
Consistency Eventual / Strong Per feature
Durability No data loss Or define RPO
Scalability Handle 10× growth State the assumption

Step 4 — Capacity Estimation (80:20 Rule)

80% of traffic in 20% of the day → peak ≈ 2–3× average

DAU:             ___M
Avg QPS:         DAU × actions_per_day / 86,400 = ___k
Peak QPS:        avg × 3 = ___k

Write QPS:       ___k     Read QPS: ___k
Record size:     ___ KB

Daily storage:   write_QPS × 86,400 × record_size = ___ GB/day
5yr total:       daily × 1,825 × 3 (replicas) = ___ TB

Ingress BW:      write_QPS × request_size = ___ MB/s
Egress BW:       read_QPS × response_size = ___ GB/s

→ Full latency numbers + power-of-2 table: capacity/capacity_planning.md


Step 5 — API Design

REST for external-facing; gRPC for internal (binary, lower latency, streaming). GraphQL when clients need flexible field selection (BFF pattern).

Protocol Use When
REST (HTTP/JSON) Public API, browser clients, simple CRUD
gRPC Internal microservices, streaming, mobile (protobuf smaller)
GraphQL Flexible queries, mobile BFF, multiple client types
WebSocket Bidirectional real-time (chat, live feed, gaming)
SSE Server → client stream only (notifications, live dashboard)

Design Rules

  • Idempotency: PUT/DELETE must be idempotent; POST needs Idempotency-Key header for retries
  • Pagination: cursor-based (stable, scalable) over offset (breaks on inserts)
  • Versioning: /v1/ prefix; never break existing clients
  • Return 202 Accepted + job ID for async operations
# Example — URL shortener
POST /v1/urls
  Body: { long_url, ttl_seconds?, alias? }
  Response: { short_code, short_url, expires_at }

GET /{code}                           → 301/302 redirect
GET /v1/urls/{code}/analytics         → { clicks, countries[], referrers[] }
DELETE /v1/urls/{code}                → 204 No Content

Step 6 — Data Model

Which Database?

System Use When
PostgreSQL ACID, complex queries, joins, geospatial (PostGIS), general-purpose
MySQL Web apps, read replicas via binlog, wide ecosystem
DynamoDB Managed, predictable latency, simple KV/document, autoscale
Cassandra Multi-region writes, high write throughput, time-series, known access patterns
MongoDB Flexible schema, nested documents, moderate consistency OK
Redis Sub-ms reads, sessions, leaderboards, rate limiting, pub/sub, counters
ClickHouse OLAP, columnar, analytical queries on billions of rows
Elasticsearch Full-text search, log analytics, faceted search
InfluxDB / TimescaleDB Time-series metrics, IoT, retention policies
Neo4j Graph traversal, fraud detection, recommendations, social graph
S3 / GCS Blob storage, large files, cheap durable storage, archival

Schema Design Rules

  • Design around access patterns, not entities
  • Denormalize for read performance — accept write complexity
  • Partition key determines data distribution — avoid sequential keys (hot partition)
  • Always include created_at, updated_at, soft-delete deleted_at
  • Store large blobs in S3 — only URL in DB

Step 7 — High-Level Design

Client → CDN → LB → API Gateway → Services → DB / Cache / Queue
                                           ↓
                                  Workers ← Message Queue
                                           ↓
                                  Object Store (S3)

Component Checklist

  • Client (web/mobile/API)
  • CDN (static assets, large files, geo reads)
  • Load balancer (L4 or L7)
  • API Gateway (auth, rate limit, routing)
  • App servers (stateless → horizontally scalable)
  • Primary DB + read replicas
  • Cache (Redis/Memcached)
  • Message queue (Kafka/SQS) for async decoupling
  • Background workers
  • Object store (S3) for blobs
  • Search index (Elasticsearch) if needed
  • Monitoring + alerting (Prometheus, Grafana, PagerDuty)

Step 8 — Deep Dive

Pick 2–3 hard components. Interviewer usually steers this.

Common Deep Dives Key Questions
Write path How does data flow from client → durable storage?
Read path How is data fetched, cached, served efficiently?
Fan-out How does one write reach N users?
Hotspot What happens when one key/user gets 10× traffic?
Consistency How do we handle replication lag?
Search How is data indexed and queried at scale?
Unique ID gen Snowflake, UUID, hash-based — which and why?

Patterns by Problem Type

Read-Heavy (feed, catalog, search)

  • Cache aggressively (Redis L1, CDN L2)
  • Read replicas — route 80%+ reads away from primary
  • Denormalize / precompute (materialized views)
  • Serve stale if tolerable (TTL + async background refresh)
  • Indexes on all query fields; covering indexes to avoid table scans

Write-Heavy (logging, metrics, events, IoT)

  • LSM-tree storage (Cassandra, RocksDB, InfluxDB) — sequential writes
  • Message queue as write buffer (Kafka) → async batch flush to DB
  • Partition by hash to spread load evenly
  • Avoid sequential keys — use random prefix or hash to prevent hot partitions
  • Return 202 Accepted immediately; process async

Real-Time / Low-Latency (chat, live feed, gaming)

  • WebSockets for bidirectional persistent connections
  • SSE for server → client push only
  • Long polling as degraded fallback
  • Pub/sub (Redis pub/sub, Kafka) for fan-out to connected clients
  • In-memory state (Redis) for presence, typing indicators, online status

Fan-Out at Scale (Twitter timeline, notifications)

Model Write Cost Read Cost Use When
Fan-out on write (push) O(N followers) O(1) Read-heavy, followers < 10k
Fan-out on read (pull) O(1) O(following) Write-heavy, celebrity accounts
Hybrid Async push normal users Pull for celebrities Twitter/Instagram

Strong Consistency Required (payments, inventory, booking)

  • Single-region SQL with transactions
  • Serializable or SSI isolation
  • Optimistic locking (version column + CAS) or SELECT FOR UPDATE
  • Idempotency key on all write APIs (dedup window = 24h)
  • Prefer saga pattern over 2PC for cross-service atomicity

When to Use Which Tool

Message Queue

System Use When Avoid When
Kafka High throughput, ordered events, replay, event sourcing, audit log Need <10ms latency, simple task queue
SQS Simple managed task queue, at-least-once OK, serverless Strict ordering, replay
SQS FIFO Strict ordering, exactly-once per group High throughput (300 msg/s limit/group)
Redis Streams Low latency, small scale, in-memory OK Durability critical, large volumes
RabbitMQ Complex routing (fanout/topic/direct), legacy High throughput at scale

Cache

System Use When
Redis Rich structures (sorted sets, hashes), pub/sub, Lua scripting, persistence option
Memcached Simple KV only, max memory efficiency, multi-threaded, no persistence
In-process (Caffeine) Sub-ms latency, JVM app, tolerate stale across instances
CDN Static assets, large files, geo-distributed reads, staleness tolerated

Load Balancer

Type Layer Knows Use For
L4 (NLB) TCP/UDP IP + port Low latency, non-HTTP, raw throughput
L7 (ALB) HTTP URL, headers, cookies HTTP routing, SSL termination, rate limiting, A/B

RPC / API Protocol

Protocol Latency Payload Streaming Use
REST/HTTP Medium JSON (verbose) No Public APIs, browsers
gRPC Low Protobuf (compact) Yes (bidirectional) Internal services, mobile
GraphQL Medium JSON flexible Subscriptions BFF, flexible client queries
WebSocket Very low Binary/text Bidirectional Chat, gaming, live data
SSE Low Text stream Server→client Notifications, live feeds

Failure Modes & Resilience

Circuit Breaker

CLOSED → [threshold failures] → OPEN (fast-fail)
   ↑                                    ↓
HALF-OPEN ←────── [cooldown expires] ───┘
  (probe one request → success → CLOSED, fail → OPEN)

Prevents cascade failure when downstream is slow/down. Used in: Resilience4j, AWS SDK.

Rate Limiting Algorithms

Algorithm Behavior Use
Token bucket Allows bursts up to bucket capacity API gateways, bursty traffic
Leaky bucket Strict constant output rate Traffic shaping
Fixed window Simple counter per window Easy but spike at window edge
Sliding window log Exact, memory-heavy High-precision
Sliding window counter Approx, memory-efficient Most production systems

Graceful Degradation

  • Serve stale cache when DB is down
  • Return partial results instead of full error
  • Disable non-critical features (recommendations) when overloaded
  • Queue writes instead of rejecting when downstream is slow

SPOF Checklist

  • DB: replica set or managed failover (RDS Multi-AZ)
  • App servers: multiple instances behind LB
  • Cache: Redis Sentinel or Redis Cluster
  • Queue: Kafka cluster (3+ brokers) or managed SQS
  • Region: multi-AZ minimum; multi-region for 99.999%
  • External deps: timeout + fallback for every call

Idempotency Pattern

Client → POST /payments { amount: 100, idempotency_key: "uuid-123" }
Server → check dedup store for "uuid-123"
       → if exists: return cached response
       → if new: process + store (key → response) with TTL

Used in: Stripe, AWS SDK, any payment or critical write API.


Async vs Sync

Sync Async
Use when Response needed immediately Processing can be deferred
Client experience Waits for result Gets 202 + job ID immediately
Coupling Tight Loose
Failure handling Caller handles instantly Queue retries automatically
Examples Auth, reads, payments Email, notifications, analytics, media encoding

Pull vs Push

Push Pull
Latency Lower — server initiates Higher — bounded by poll interval
Scale Hard — server tracks N subscribers Easy — consumers control their own rate
Miss risk Yes, if consumer is down No — consumer fetches when ready
Use WebSocket, SSE, webhooks, notifications Kafka consumer, cron jobs, RSS

CDN

  • Use for: static assets (JS/CSS/images), video, large files, geo-distributed reads
  • Cache-Control + Expires headers control edge TTL
  • Invalidation: versioned filenames (/app.v4.js) preferred over purge API
  • Origin shield: client → CDN edge → CDN POP → origin (reduces origin fan-in)
  • Stale-while-revalidate: serve stale, refresh in background — no user-visible latency

Common Trade-offs to Discuss

Trade-off Option A Option B
Consistency vs availability Strong (CP) — safer Eventual (AP) — faster
Latency vs durability Async write (fast) Sync replicate (safe)
Normalization vs denormalization Clean schema, slower reads Redundant data, fast reads
Fan-out on write vs read Fast reads, slow writes Fast writes, slow reads
SQL vs NoSQL Flexible queries, harder to shard Fixed patterns, easy to shard
Monolith vs microservices Simple ops, hard to scale parts Complex ops, independent scale
Push vs pull delivery Low latency Consumer controls pace
Cache-aside vs write-through Simple, possible cold start Always warm, write overhead
Strong ID (UUID) vs Seq ID No hotspot, no ordering Ordered, hot partition risk

Hot Key / Skew Problems

Problem Symptom Fix
Hot partition One shard at 100%, rest idle Random key suffix, virtual shards
Celebrity / viral post Fan-out to 100M followers spikes Hybrid push/pull, async workers
Cache stampede Hot key expires → DB hammered Mutex on miss, probabilistic early expiry, background refresh
Write skew Concurrent reads of shared resource → both write SELECT FOR UPDATE, serializable isolation
Hot replica All reads to one replica Read replica pool + random routing

Red Flags Interviewers Watch For

  • ❌ Single point of failure with no mitigation mentioned
  • ❌ Storing large blobs (images/video) directly in DB
  • ❌ Synchronous fan-out to millions of followers
  • ❌ No caching for read-heavy workloads
  • ❌ Sequential auto-increment keys on a sharded system
  • ❌ Polling DB at high frequency instead of event-driven
  • ❌ No idempotency on payment or critical write APIs
  • ❌ Strong consistency everywhere (over-engineering where eventual is fine)
  • ❌ Jumping to components without clarifying requirements first
  • ❌ No mention of monitoring, logging, or alerting

Green Flags

  • ✅ Ask clarifying questions before drawing anything
  • ✅ State assumptions explicitly ("I'll assume 100M DAU")
  • ✅ Present trade-offs: "X vs Y — I chose X because..."
  • ✅ Proactively mention what breaks at 10× and how to handle it
  • ✅ Bring up hot keys, skew, and celebrity problem unprompted
  • ✅ Design for failure: circuit breaker, graceful degradation, retry + idempotency
  • ✅ Capacity numbers that justify architectural choices
  • ✅ Know when NOT to distribute: "a single Postgres instance handles this fine at this scale"