Capacity Planning
Power of 2
| Power | Exact Value | Approx | Unit |
|---|---|---|---|
| 7 | 128 | ||
| 8 | 256 | ||
| 10 | 1,024 | 1 thousand | 1 KB |
| 16 | 65,536 | 64 KB | |
| 20 | 1,048,576 | 1 million | 1 MB |
| 30 | 1,073,741,824 | 1 billion | 1 GB |
| 32 | 4,294,967,296 | 4 GB | |
| 40 | 1,099,511,627,776 | 1 trillion | 1 TB |
Latency Numbers (Every Engineer Should Know)
| Operation | Nanoseconds | Microseconds | Milliseconds | Notes |
|---|---|---|---|---|
| L1 cache reference | 0.5 ns | |||
| Branch mispredict | 5 ns | |||
| L2 cache reference | 7 ns | 14× L1 cache | ||
| Mutex lock/unlock | 25 ns | |||
| Main memory reference | 100 ns | 20× L2, 200× L1 | ||
| Compress 1KB (Snappy) | 10,000 ns | 10 µs | ||
| Send 1KB over 1Gbps | 10,000 ns | 10 µs | ||
| Read 4KB randomly from SSD | 150,000 ns | 150 µs | ~1 GB/s SSD | |
| Read 1MB sequentially from RAM | 250,000 ns | 250 µs | ||
| Round trip within same DC | 500,000 ns | 500 µs | ||
| Read 1MB sequentially from SSD | 1,000,000 ns | 1,000 µs | 1 ms | 4× memory |
| HDD seek | 10,000,000 ns | 10,000 µs | 10 ms | 20× DC round trip |
| Read 1MB sequentially over 1Gbps | 10,000,000 ns | 10,000 µs | 10 ms | 10× SSD |
| Read 1MB sequentially from HDD | 30,000,000 ns | 30,000 µs | 30 ms | 30× SSD |
| Packet CA → Netherlands → CA | 150,000,000 ns | 150,000 µs | 150 ms |
1 ns = 10^-9 s
1 µs = 10^-6 s = 1,000 ns
1 ms = 10^-3 s = 1,000 µs = 1,000,000 ns
Key Takeaways
- Memory is 20× faster than network within DC, 200× faster than inter-continental
- SSD random read ~1000× slower than L1 cache
- HDD seek ~20× slower than SSD random read
- Avoid cross-continent calls in the hot path — 150ms is a full user-perceptible delay
Capacity Estimation Framework
Step 1 — Define Scale
| Metric | Question |
|---|---|
| DAU | How many daily active users? |
| Actions/user/day | How many reads/writes per user? |
| Data size per action | How large is each request/record? |
| Retention | How long is data stored? |
Step 2 — Throughput (QPS)
Method A: DAU-based
avg QPS = (DAU × actions_per_day) / 86,400
Method B: Direct (IoT, sensors)
QPS = total_devices / reporting_interval_seconds
Common DAU → QPS estimates (500M DAU example):
| Action | Formula | QPS |
|---|---|---|
| Homepage view | 500M × 10 / 86,400 | ~58k |
| Profile visit | 500M × 2 / 86,400 | ~12k |
| Content creation (10% create) | 500M × 0.1 × 1 / 86,400 | ~580 |
Step 3 — Peak QPS (80:20 Rule)
80% of daily traffic occurs in 20% of the day (~5 hours)
peak QPS = (daily_requests × 0.8) / (peak_hours × 3600)
Example: 4B daily requests → 4B × 0.8 / (4 × 3600) ≈ 222k peak QPS
Rule of thumb: peak QPS ≈ 2–3× average QPS
Step 4 — Server Instances
workers_per_instance = 1 / response_time_seconds
QPS_per_instance = workers × parallelism_factor
instances_needed = peak_QPS / QPS_per_instance
Example (200ms response time, 32 workers):
workers_per_instance = 1 / 0.2 = 5 req/s per worker
QPS per instance = 32 × 5 = 160 QPS
instances needed = 200k / 160 ≈ 1,250 instances
Add 20–30% headroom for spikes.
Step 5 — Storage
daily_storage = QPS × seconds_per_day × record_size
= QPS × 86,400 × bytes_per_record
total_storage = daily_storage × retention_days × replication_factor
Example (write QPS=1k, 1KB/record, 5yr retention, 3× replication):
daily = 1,000 × 86,400 × 1KB ≈ 86 GB/day
total = 86 GB × 1,825 days × 3 ≈ 470 TB
Step 6 — Bandwidth
ingress = request_size × write_QPS
egress = response_size × read_QPS
Egress optimizations (reduces naive bandwidth by ~67%):
- Serve thumbnails (30KB) instead of full images (300KB); load full on demand
- Text excerpts, not full content
- No video autoplay; only load on explicit action
- 80% of users don't scroll past 5 posts
Common Size Reference
| Object | Approx Size |
|---|---|
| Integer / float | 4 bytes |
| UUID | 16 bytes |
| Timestamp | 8 bytes |
| Short string (50 chars) | ~50 bytes |
| Tweet / short post (250 chars) | ~1 KB |
| IoT sensor payload (JSON + HTTP) | ~0.5 KB |
| Web page HTML | ~50 KB |
| Compressed image (thumbnail) | ~30 KB |
| Compressed image (full) | ~300 KB |
| Video (1 min, compressed) | ~1 MB |
Quick Estimation Cheatsheet
| Metric | Rule of Thumb |
|---|---|
| Peak QPS | 2–3× average QPS |
| Peak storage growth | 2× baseline estimate |
| Read:Write ratio (social) | 100:1 |
| Read:Write ratio (messaging) | 1:1 |
| Cache hit rate target | ≥ 99% for hot data |
| Replication factor | 3× (standard) |
| 1 server (commodity) | ~1k–10k QPS depending on work |
| 1 server RAM | 64–256 GB |
| 1 server disk (SSD) | 1–10 TB |
| 1 server NIC | 1–10 Gbps |
Estimation Template
1. Scale
DAU: ___M
Avg QPS: (DAU × actions) / 86,400 = ___k
Peak QPS: avg × 2–3 = ___k
2. Storage
Record size: ___ bytes/KB
Daily write vol: write_QPS × 86,400 × size = ___ GB/day
5yr total: daily × 1,825 × 3 (replica) = ___ TB
3. Bandwidth
Ingress: write_QPS × request_size = ___ MB/s
Egress: read_QPS × response_size = ___ GB/s
After optim.: egress × 0.33 = ___ GB/s
4. Servers
instances: peak_QPS / QPS_per_server = ___
+ 30% headroom: ___ instances