Redis HA in Kubernetes

High Availability, Data Replication, Multi-Zone & Multi-Cluster Sync

REDIS · KUBERNETES · SENTINEL · CLUSTER MODE · GEO-REPLICATION

1. Redis HA Deployment Modes

🖥️

Standalone

Single pod. No HA. Good for dev/cache-only.

🔁

Primary + Replica

One master, N replicas. Read scaling. Manual failover.

🛡️

Sentinel HA

Auto-failover via Sentinel quorum. Single shard.

🗂️

Cluster Mode

Sharding + HA. 16384 slots across ≥3 primaries.

Key principle: Redis replication is always asynchronous — writes are ack'd before they reach replicas. This means a failover can lose the last few milliseconds of writes if the master crashes before replicating.

2. Redis Sentinel — Auto-Failover

🛡️

Sentinel Process

Monitoring daemon

Runs alongside Redis. Monitors master health, coordinates failover via quorum vote.

🗳️

Quorum

Consensus requirement

Minimum Sentinels that must agree master is down before failover starts. Typical: 2 of 3.

🔄

Leader Election

Raft-like vote

Sentinels elect a leader among themselves. Leader picks the best replica to promote.

📡

Client Discovery

Service locator

Clients ask Sentinel for current master address — Sentinel is the source of truth.

Sentinel Topology in 3 Availability Zones

ZONE A (primary)
🔴 redis-master-0 MASTER
🟣 sentinel-0 SENTINEL
ZONE B
🟡 redis-replica-1 REPLICA
🟣 sentinel-1 SENTINEL
ZONE C
🟡 redis-replica-2 REPLICA
🟣 sentinel-2 SENTINEL
Rule: Always deploy an odd number of Sentinels (3, 5, 7) across at least 3 zones. With 2 Sentinels, a single zone failure prevents quorum and blocks failover.
# redis-sentinel.conf sentinel monitor mymaster redis-master-0.redis.svc 6379 2 # ^^ quorum = 2 (2 of 3 sentinels must agree) sentinel down-after-milliseconds mymaster 5000 # ^^ master considered down after 5s of no PING reply sentinel failover-timeout mymaster 60000 # ^^ failover must complete within 60s sentinel parallel-syncs mymaster 1 # ^^ only 1 replica resyncs at a time during failover

3. Redis Cluster Mode — Sharding + HA

🗂️

Hash Slots

Data partitioning

16384 slots distributed across primary shards. Each key hashes to a slot: CRC16(key) % 16384.

🔗

Gossip Protocol

Cluster bus port 16379+

Nodes exchange health/slot info via gossip every second. Failure detected by majority of nodes.

Auto Failover

No Sentinel needed

Cluster built-in: when a primary fails, its replica is promoted automatically by cluster vote.

📦

Min 6 Pods

3 primary + 3 replica

Minimum viable cluster: 3 primaries (each owning ~5461 slots) each with 1 replica.

Client MOVED/ASK redirect ZONE A Primary 0 Slots 0–5460 (⅓ of data) M Replica of Shard 1 Slots 5461–10922 copy R ZONE B Primary 1 Slots 5461–10922 M Replica of Shard 2 Slots 10923–16383 copy R ZONE C Primary 2 Slots 10923–16383 M Replica of Shard 0 Slots 0–5460 copy R Primary (write) Replica (read) Async replication Client MOVED redirect
Cross-zone replica placement: Each primary's replica lives in a different zone. If Zone A fails, Primary 0's replica in Zone C is promoted — the cluster survives with 2 zones intact.

4. Kubernetes Patterns for Redis

StatefulSet — why not Deployment?

1

Stable Pod Names

redis-0, redis-1, redis-2 — always the same. Sentinel config can reference stable DNS.

2

Ordered Start/Stop

Redis 0 starts first (becomes master), replicas 1 & 2 start after and sync from master.

3

Persistent Volume per Pod

Each pod gets its own PVC via volumeClaimTemplates — data survives pod rescheduling.

4

Headless Service

clusterIP: None gives each pod a stable DNS: redis-0.redis.namespace.svc.cluster.local

Services Needed

A

Headless Service (StatefulSet DNS)

clusterIP: None — enables pod-to-pod DNS for replication and Sentinel discovery.

B

Master Service (for writes)

ClusterIP service pointing to current master. Updated dynamically by operator or Sentinel hook.

C

Replica Service (for reads)

Targets all replicas via label selector. Clients can distribute reads across replicas.

# Headless service apiVersion: v1 kind: Service metadata: name: redis-headless spec: clusterIP: None # headless! selector: app: redis ports: - port: 6379

Zone-Aware Scheduling with topologySpreadConstraints

# Force pods to spread across zones (K8s 1.19+) topologySpreadConstraints: - maxSkew: 1 # max allowed imbalance between zones topologyKey: topology.kubernetes.io/zone whenUnsatisfiable: DoNotSchedule # fail if can't spread labelSelector: matchLabels: app: redis # Also use podAntiAffinity to keep master & replica in different zones affinity: podAntiAffinity: preferredDuringSchedulingIgnoredDuringExecution: - weight: 100 podAffinityTerm: topologyKey: topology.kubernetes.io/zone labelSelector: matchLabels: app: redis

5. Multi-Zone Data Sync — How It Works

Within a single cluster: Redis replication is intra-cluster async. Zones are just a scheduling concern — Redis doesn't know about zones. The K8s scheduler places pods across zones; Redis replicates data regardless.
App / Client SET key value EX 60 ① WRITE MASTER (Zone A) ① Receive write → ACK to client (async repl) ② Write to AOF/RDB on disk (persistence) ② ACK ③ async repl stream ③ async repl stream REPLICA (Zone B) Receives repl commands Applies to own dataset REPLICA (Zone C) Receives repl commands Applies to own dataset Replication Details Protocol: RESP3 Transport: TCP Method: command log Initial sync: FULLSYNC (RDB snapshot) Ongoing: PSYNC (backlog buffer) Lag: sub-millisecond within same cluster Availability Zone B Availability Zone C Availability Zone A
Replication gap risk: The master ACKs the write before replicas confirm. If the master crashes between ACK and replication, the client thinks data was saved but replicas don't have it. Use WAIT numreplicas timeout for semi-sync semantics when durability matters.
# Semi-synchronous write: wait for at least 1 replica to confirm SET mykey myvalue WAIT 1 100 # wait for 1 replica, timeout 100ms # Returns number of replicas that acknowledged. 0 = potential data loss on failover.

6. Replication Flow — FULLSYNC vs PSYNC

1

Replica sends PSYNC ? -1

Replica doesn't know master's replication ID yet. Asks for a full sync.

2

Master forks + creates RDB snapshot

Master forks a child process to dump current dataset to an RDB file. Parent continues serving clients.

3

Master sends FULLRESYNC + RDB

Sends replication ID and current offset, then streams the RDB file to the replica. New writes are buffered in replication backlog.

4

Replica loads RDB, then drains buffer

Replica flushes existing data, loads the RDB snapshot, then applies buffered commands that arrived during the RDB transfer.

5

Replica enters online replication mode

Now in sync. Master streams every write command to the replica in real-time.

Cost of FULLSYNC: Forks memory (doubles RAM usage momentarily), CPU spike for RDB serialization, network bandwidth for large datasets. Avoid by keeping replicas healthy to use PSYNC.
1

Master maintains replication backlog

A circular buffer (default 1MB, tune with repl-backlog-size) holds recent commands. If replica lag stays within backlog size, partial sync is possible.

2

Every write is sent as RESP command

Master sends a stream of Redis commands (SET, HSET, ZADD…) to replicas. Replica applies them in order.

3

Replica tracks replication offset

Both master and replica maintain the current byte offset. If they match, replica is fully caught up.

4

REPLCONF ACK sent periodically

Replica sends its current offset to master every second. Master uses this for WAIT command and lag monitoring.

# Monitor replication lag from master: INFO replication # Look for: slave0:ip=10.0.0.5,port=6379,state=online,offset=1234567,lag=0 # lag=0 means replica is fully in sync # lag=N means N seconds behind
1

Replica reconnects, sends PSYNC <replid> <offset>

Replica remembers the master's replication ID and its last known offset.

2

Master checks if offset is still in backlog

If the replica's offset falls within the backlog window → CONTINUE (partial sync). If the backlog was overwritten → FULLRESYNC.

3

Partial sync (best case)

Master only sends the missing commands. Fast, low bandwidth — typically what happens after a brief network blip.

Tune the backlog: repl-backlog-size 64mb — for high-write-rate clusters or replicas in distant zones with occasional network gaps, a larger backlog avoids expensive full resyncs.

7. Multi-Cluster & Geo-Distributed Sync

Important distinction: Redis built-in replication only works within a single master chain. Cross-cluster sync (different K8s clusters, different regions) requires additional tooling.
🌍

Redis Enterprise Geo-Replication

Active-Active (CRDT)

Conflict-free Replicated Data Types (CRDTs). Each region is a primary that accepts writes. Conflicts resolved automatically. Requires Redis Enterprise license.

🔀

RedisGears / KeyDB

OSS Active-Active

KeyDB (fork) supports multi-master async replication between clusters. Community alternative to Redis Enterprise for active-active.

📨

Application-Level Fan-Out

Write to all regions

App writes to multiple Redis clusters simultaneously. Simple but doubles write latency. Suitable for cache invalidation, not primary storage.

🔧

redis-shake / rump

OSS migration tools

Tools for one-way streaming replication between independent Redis instances/clusters. Used for migration, DR, and near-real-time cross-cluster sync.

K8s Cluster — US-EAST Redis Primary Accept R + W Redis Replica ×2 Read + Failover Sentinel ×3 Monitor + Failover K8s Cluster — EU-WEST Redis Primary Accept R + W Redis Replica ×2 Read + Failover Sentinel ×3 Monitor + Failover Sync Bridge redis-shake / Enterprise Active-Passive or Active-Active Cross-cluster: VPN / Service Mesh / LoadBalancer IP Latency: 10–200ms across regions ⚠ Active-Passive: one region primary, other is DR ✓ Active-Active (CRDT): both regions accept writes

8. K8s Operators for Redis

⚙️

Redis Operator (Spotahome)

OSS — github.com/spotahome

Manages Sentinel-based HA. Creates StatefulSets, Services, ConfigMaps automatically. Handles failover endpoint updates.

🎯

Redis Cluster Operator (OT-Container)

OSS — ot-container-kit

Manages Redis Cluster mode. Handles slot rebalancing, node addition/removal, TLS, password rotation.

🏢

Redis Enterprise Operator

Commercial — Redis Inc.

Full lifecycle management of Redis Enterprise clusters. Supports geo-replication, Active-Active, module management, and enterprise security features.

📦

Bitnami Redis Helm Chart

Most common starting point

Helm chart that deploys Sentinel HA or standalone. Not a true operator but widely used. Add sentinel.enabled=true for HA mode.

# Bitnami Redis HA with Sentinel via Helm helm install redis bitnami/redis \ --set architecture=replication \ --set sentinel.enabled=true \ --set sentinel.quorum=2 \ --set replica.replicaCount=2 \ --set global.redis.password=supersecret # This creates: # redis-node-0 (master + sentinel) # redis-node-1 (replica + sentinel) # redis-node-2 (replica + sentinel) # redis (service → current master) # redis-headless (headless DNS)

9. Comparison: Deployment Modes

Mode HA? Sharding? Failover Multi-Zone Data Loss Risk Complexity Best For
Standalone None No Manual No High Low Dev / ephemeral cache
Primary + Replica Partial No Manual With scheduling Medium Low Read scaling + manual DR
Sentinel HA Yes No Auto (~30s) Yes (3 zones) Low-medium Medium Single-shard HA, <100GB
Cluster Mode Yes Yes Auto (<10s) Yes (built-in) Low-medium High Large datasets, >100GB, high throughput
Enterprise Geo Yes Yes Auto Multi-region Very Low Very High Global apps, Active-Active

10. Sentinel Failover — Step-by-Step Walkthrough

1

Master pod crashes (Zone A fails)

redis-master-0 stops responding. All 3 Sentinels stop receiving PING replies from port 6379.

2

Subjective Down (SDOWN) — each Sentinel marks master

After down-after-milliseconds (default 5s), each Sentinel independently marks master as SDOWN (Subjectively Down).

3

Objective Down (ODOWN) — quorum reached

Sentinels exchange SENTINEL IS-MASTER-DOWN-BY-ADDR messages. Once quorum (e.g. 2 of 3) agree → master is ODOWN (Objectively Down).

4

Leader Sentinel election

Sentinels elect a leader via Raft-like vote (first Sentinel to request gets votes, majority wins). Leader coordinates the failover.

5

Best replica selected

Leader picks replica with: (1) lowest slave-priority, (2) highest replication offset (most up to date), (3) lowest run ID as tiebreaker.

6

SLAVEOF NO ONE — new master

Leader Sentinel sends SLAVEOF NO ONE to the chosen replica (e.g. redis-replica-1 in Zone B). It promotes itself to master.

7

Other replicas reconfigured

Sentinels send SLAVEOF new-master-ip 6379 to remaining replicas. They resync from the new master.

8

Clients notified + service updated

Sentinels publish +switch-master event. Smart clients (Jedis, StackExchange.Redis, go-redis) listening to Sentinel automatically reconnect to new master. Operators update K8s Service selector.

9

Old master demoted (if it comes back)

When the original master pod recovers, Sentinel reconfigures it as a replica of the new master. Data is re-synced via PSYNC.

Total failover time: Typically 30–60 seconds with default settings. Tune down-after-milliseconds lower (e.g. 2000ms) for faster detection, but beware false positives on network blips. Client reconnection adds another 1–5 seconds.
# Watch failover in real-time: redis-cli -p 26379 SUBSCRIBE __sentinel__:hello redis-cli -p 26379 PSUBSCRIBE "*" # see all events including +switch-master # Query current master from Sentinel: redis-cli -p 26379 SENTINEL get-master-addr-by-name mymaster # Check replica sync status on new master: redis-cli INFO replication | grep -E "role|slave[0-9]|master_replid"