Load Balancer

Load Balancer

What It Does

  • Distributes incoming traffic across multiple backend servers
  • Hides topology from clients (single VIP / DNS entry)
  • Health checks — removes unhealthy backends automatically
  • TLS termination, connection pooling, rate limiting (L7)

L4 vs L7

L4 (Transport Layer) L7 (Application Layer)
Works at TCP / UDP HTTP / HTTPS / gRPC / WebSocket
Sees IP + port only URL, headers, cookies, body
Routing basis IP tuple hash Path, host, header, method
TLS Pass-through (no termination) Terminates TLS, re-encrypts to backend
Latency Very low (no parsing) Slightly higher (parse HTTP)
Sticky sessions IP-hash only Cookie-based, header-based
Health check TCP connect HTTP GET /health — checks app logic
Use case Raw TCP (DB proxies, SMTP), ultra-low latency HTTP APIs, microservices, A/B testing
Examples AWS NLB, HAProxy TCP mode, IPVS AWS ALB, Nginx, Envoy, HAProxy HTTP mode

When to Use L4

  • Non-HTTP protocols (MySQL proxy, Redis proxy, SMTP)
  • Need absolute minimum latency overhead
  • TLS passthrough required (client cert auth end-to-end)
  • Very high connection rates (millions/sec)

When to Use L7

  • HTTP/HTTPS microservices (almost always)
  • Need URL-based routing (/api → service A, /static → S3)
  • Header-based routing (canary, A/B, multi-tenant)
  • Rate limiting, auth, request rewriting at the edge
  • WebSocket upgrades, gRPC (HTTP/2)

Load Balancing Algorithms

Stateless (no per-backend state)

Algorithm How Best For
Round Robin Requests 1,2,3... distributed in order Homogeneous servers, uniform requests
Weighted Round Robin Each server gets weight proportional to capacity Heterogeneous servers (different CPU/RAM)
Random Pick backend at random Simple, no coordination, works at scale
IP Hash hash(client_ip) % N → same server per client Weak sticky session, stateful backends
URL Hash hash(url) % N → same server per URL CDN/cache servers — same content to same node

Stateful (tracks backend state)

Algorithm How Best For
Least Connections Route to backend with fewest active connections Long-lived connections (WebSocket, file upload)
Weighted Least Connections Least connections + capacity weight Mixed-capacity fleet with long connections
Least Response Time Route to fastest-responding backend Latency-sensitive, heterogeneous response times
Resource-Based Route based on CPU/memory reported by agents Compute-intensive workloads

Consistent Hashing

  • Backend servers placed on a virtual ring
  • Request key hashed → clockwise walk → first node
  • Adding/removing a server only remaps K/N keys (not all)
  • Used when: session affinity, cache affinity (same key to same cache node)
  • Virtual nodes per server → more even distribution
Ring: 0 ──────── ServerA ──── ServerB ──── ServerC ──── 2^32
Request hash → clockwise → first server = owner

Stateless vs Stateful Backends

Stateless Backends (preferred)

  • No session data stored on the app server
  • Any request can go to any instance → true horizontal scale
  • Session state stored externally (Redis, DB)
  • LB algorithm: round robin / random — simple and effective

Stateful Backends (harder to scale)

  • Session data lives on a specific server
  • Requires sticky sessions (session affinity)
  • If that server dies → session lost (unless replicated)

Sticky Sessions Implementation

Method How Risk
Cookie-based LB injects Set-Cookie: SERVERID=s1; routes by cookie Cookie can be stripped, HTTPS-only safe
IP-hash hash(client_ip) % N → always same server Breaks with NAT (many users → same IP), CGNAT
Consistent hashing Stable mapping via ring Node failure remaps only adjacent keys

Best practice: avoid sticky sessions — move state to Redis instead.


Health Checks

Type How Use
TCP check Open TCP connection to port L4 LBs; confirms port is listening
HTTP check GET /health → expect 200 L7 LBs; confirms app is alive
Deep check /health checks DB + cache connectivity Detects degraded (alive but broken) backends
Passive Monitor error rate on real traffic Detect degraded performance without extra probes
  • Unhealthy threshold: 2–3 consecutive failures → remove
  • Healthy threshold: 2–3 consecutive successes → re-add
  • Don't couple /health to flaky dependencies — cascading removal under DB slowness

Rate Limiting at the Load Balancer

Where to Rate Limit

Client → [CDN rate limit] → [LB/API GW rate limit] → [App rate limit] → Backend
  • CDN layer: block DDoS, per-IP limits, geographic blocks
  • LB/API Gateway: per-client token bucket, per-route limits
  • App layer: per-user business logic limits (X requests/hour per account)

Rate Limiting Algorithms

Algorithm Behavior Memory Use
Token bucket Refill at rate R; burst up to capacity B O(1) per key API gateways — allows controlled bursts
Leaky bucket Queue requests; drain at constant rate O(queue size) Traffic shaping, smooth output
Fixed window Counter per time window (1min, 1hr) O(1) per key Simple; edge spike at window boundary
Sliding window log Log all request timestamps; count in window O(requests) Exact; memory-heavy at high QPS
Sliding window counter Weighted interpolation of two fixed windows O(1) per key Approximate; production standard

Sliding Window Counter (most common)

current_window_count = prev_window_count × (1 - elapsed/window) + current_count

Example: window=60s, elapsed=45s, prev=80, curr=30
→ rate = 80 × (1 - 45/60) + 30 = 80×0.25 + 30 = 50 requests in window

Distributed Rate Limiting

  • Single server: in-process counter (fastest)
  • Multi-node: Redis atomic counter (INCR + EXPIRE) or Redis Lua script for atomicity
  • Tradeoff: Redis adds ~1–2ms per check; acceptable for most API GWs
-- Redis atomic rate limit check (Lua)
local count = redis.call('INCR', key)
if count == 1 then redis.call('EXPIRE', key, window_seconds) end
return count

Connection Handling

Connection Pooling (L7)

  • LB maintains persistent connections to backends (keep-alive)
  • Client opens new connection → LB reuses existing backend connection
  • Avoids TCP handshake overhead per request
  • Critical for DB proxies (PgBouncer, ProxySQL)

SSL/TLS Termination

Client ──[TLS]──► LB ──[plain HTTP or re-encrypted]──► Backend

Option A: Terminate at LB → plain HTTP to backend (faster, less secure internally)
Option B: Terminate at LB → re-encrypt to backend (mTLS) — zero-trust
Option C: TLS passthrough → backend handles TLS (L4 only, client cert auth)

HTTP/2 and gRPC

  • L7 LBs must support HTTP/2 to load balance gRPC (stream-level, not connection-level)
  • HTTP/2 multiplexes many requests on one connection → naive L4 sends all to one backend
  • Envoy/Nginx with HTTP/2 do proper per-request (stream) load balancing

Global Load Balancing (GeoDNS / Anycast)

DNS-Based (GeoDNS)

  • DNS resolver returns different IPs based on client location
  • Route US users → US region, EU users → EU region
  • TTL = 30–60s; failover is slow (TTL must expire)
  • Used by: AWS Route53 latency routing, Cloudflare, Akamai

Anycast

  • Same IP announced from multiple locations via BGP
  • Network routing sends client to nearest PoP automatically
  • Instant failover (BGP reconverges in ~seconds)
  • Used by: Cloudflare (1.1.1.1), Google (8.8.8.8), CDN PoPs

Active-Passive vs Active-Active

Mode Write Read Failover
Active-Passive Primary only Primary only Promote passive (~30–60s)
Active-Active Both regions Both regions Instant (no failover needed)
Active-Active w/ conflict Both Both Requires CRDT / last-write-wins

Service Mesh (L7 in Sidecar)

  • LB logic moves into a sidecar proxy (Envoy) next to every service instance
  • No centralized LB — each pod has its own proxy
  • Features: mTLS between services, retries, circuit breaking, distributed tracing, traffic splitting
Service A → Envoy sidecar → [mTLS] → Envoy sidecar → Service B

Used by: Istio, Linkerd, AWS App Mesh, Consul Connect


Key Numbers

Metric Typical Value
L4 LB max connections 1–10M concurrent (NLB)
L7 LB max RPS 100k–1M RPS (ALB, Nginx)
Health check interval 5–30s
Unhealthy threshold 2–3 failures
Connection timeout 30–60s (idle)
Rate limit Redis check overhead 1–2ms
DNS TTL for failover 30–60s
Anycast BGP failover ~seconds

Summary: Pick the Right LB

Need Choice
HTTP microservices, TLS, routing L7 (ALB / Nginx / Envoy)
Raw TCP, DB proxy, ultra-low latency L4 (NLB / HAProxy TCP)
Stateful app, can't move to Redis Sticky session via cookie
Cache/session locality Consistent hashing
Heterogeneous fleet Weighted round robin
Long-lived connections (WS, upload) Least connections
Global traffic routing GeoDNS + Anycast
Service-to-service (k8s) Service mesh (Envoy sidecar)
Rate limiting distributed API Sliding window counter + Redis