Capacity Planning

Capacity Planning

Power of 2

Power Exact Value Approx Unit
7 128
8 256
10 1,024 1 thousand 1 KB
16 65,536 64 KB
20 1,048,576 1 million 1 MB
30 1,073,741,824 1 billion 1 GB
32 4,294,967,296 4 GB
40 1,099,511,627,776 1 trillion 1 TB

Latency Numbers (Every Engineer Should Know)

Operation Nanoseconds Microseconds Milliseconds Notes
L1 cache reference 0.5 ns
Branch mispredict 5 ns
L2 cache reference 7 ns 14× L1 cache
Mutex lock/unlock 25 ns
Main memory reference 100 ns 20× L2, 200× L1
Compress 1KB (Snappy) 10,000 ns 10 µs
Send 1KB over 1Gbps 10,000 ns 10 µs
Read 4KB randomly from SSD 150,000 ns 150 µs ~1 GB/s SSD
Read 1MB sequentially from RAM 250,000 ns 250 µs
Round trip within same DC 500,000 ns 500 µs
Read 1MB sequentially from SSD 1,000,000 ns 1,000 µs 1 ms 4× memory
HDD seek 10,000,000 ns 10,000 µs 10 ms 20× DC round trip
Read 1MB sequentially over 1Gbps 10,000,000 ns 10,000 µs 10 ms 10× SSD
Read 1MB sequentially from HDD 30,000,000 ns 30,000 µs 30 ms 30× SSD
Packet CA → Netherlands → CA 150,000,000 ns 150,000 µs 150 ms
1 ns = 10^-9 s
1 µs = 10^-6 s = 1,000 ns
1 ms = 10^-3 s = 1,000 µs = 1,000,000 ns

Key Takeaways

  • Memory is 20× faster than network within DC, 200× faster than inter-continental
  • SSD random read ~1000× slower than L1 cache
  • HDD seek ~20× slower than SSD random read
  • Avoid cross-continent calls in the hot path — 150ms is a full user-perceptible delay

Capacity Estimation Framework

Step 1 — Define Scale

Metric Question
DAU How many daily active users?
Actions/user/day How many reads/writes per user?
Data size per action How large is each request/record?
Retention How long is data stored?

Step 2 — Throughput (QPS)

Method A: DAU-based

avg QPS = (DAU × actions_per_day) / 86,400

Method B: Direct (IoT, sensors)

QPS = total_devices / reporting_interval_seconds

Common DAU → QPS estimates (500M DAU example):

Action Formula QPS
Homepage view 500M × 10 / 86,400 ~58k
Profile visit 500M × 2 / 86,400 ~12k
Content creation (10% create) 500M × 0.1 × 1 / 86,400 ~580

Step 3 — Peak QPS (80:20 Rule)

80% of daily traffic occurs in 20% of the day (~5 hours)

peak QPS = (daily_requests × 0.8) / (peak_hours × 3600)

Example: 4B daily requests → 4B × 0.8 / (4 × 3600)222k peak QPS

Rule of thumb: peak QPS ≈ 2–3× average QPS


Step 4 — Server Instances

workers_per_instance = 1 / response_time_seconds
QPS_per_instance     = workers × parallelism_factor
instances_needed     = peak_QPS / QPS_per_instance

Example (200ms response time, 32 workers):

workers_per_instance = 1 / 0.2 = 5 req/s per worker
QPS per instance     = 32 × 5  = 160 QPS
instances needed     = 200k / 160 ≈ 1,250 instances

Add 20–30% headroom for spikes.


Step 5 — Storage

daily_storage  = QPS × seconds_per_day × record_size
               = QPS × 86,400 × bytes_per_record

total_storage  = daily_storage × retention_days × replication_factor

Example (write QPS=1k, 1KB/record, 5yr retention, 3× replication):

daily  = 1,000 × 86,400 × 1KB  ≈ 86 GB/day
total  = 86 GB × 1,825 days × 3 ≈ 470 TB

Step 6 — Bandwidth

ingress = request_size × write_QPS
egress  = response_size × read_QPS

Egress optimizations (reduces naive bandwidth by ~67%):

  • Serve thumbnails (30KB) instead of full images (300KB); load full on demand
  • Text excerpts, not full content
  • No video autoplay; only load on explicit action
  • 80% of users don't scroll past 5 posts

Common Size Reference

Object Approx Size
Integer / float 4 bytes
UUID 16 bytes
Timestamp 8 bytes
Short string (50 chars) ~50 bytes
Tweet / short post (250 chars) ~1 KB
IoT sensor payload (JSON + HTTP) ~0.5 KB
Web page HTML ~50 KB
Compressed image (thumbnail) ~30 KB
Compressed image (full) ~300 KB
Video (1 min, compressed) ~1 MB

Quick Estimation Cheatsheet

Metric Rule of Thumb
Peak QPS 2–3× average QPS
Peak storage growth 2× baseline estimate
Read:Write ratio (social) 100:1
Read:Write ratio (messaging) 1:1
Cache hit rate target ≥ 99% for hot data
Replication factor 3× (standard)
1 server (commodity) ~1k–10k QPS depending on work
1 server RAM 64–256 GB
1 server disk (SSD) 1–10 TB
1 server NIC 1–10 Gbps

Estimation Template

1. Scale
   DAU:             ___M
   Avg QPS:         (DAU × actions) / 86,400 = ___k
   Peak QPS:        avg × 2–3 = ___k

2. Storage
   Record size:     ___ bytes/KB
   Daily write vol: write_QPS × 86,400 × size = ___ GB/day
   5yr total:       daily × 1,825 × 3 (replica) = ___ TB

3. Bandwidth
   Ingress:         write_QPS × request_size = ___ MB/s
   Egress:          read_QPS × response_size = ___ GB/s
   After optim.:    egress × 0.33 = ___ GB/s

4. Servers
   instances:       peak_QPS / QPS_per_server = ___
   + 30% headroom: ___ instances