Redis HA in Kubernetes — Multi-Zone & Multi-Cluster Sync

1. Redis HA Deployment Modes

🖥️

Standalone

Single pod. No HA. Good for dev/cache-only.

🔁

Primary + Replica

One master, N replicas. Read scaling. Manual failover.

🛡️

Sentinel HA

Auto-failover via Sentinel quorum. Single shard.

🗂️

Cluster Mode

Sharding + HA. 16384 slots across ≥3 primaries.

Key principle: Redis replication is always asynchronous — writes are ack'd before they reach replicas. This means a failover can lose the last few milliseconds of writes if the master crashes before replicating.

2. Redis Sentinel — Auto-Failover

🛡️

Sentinel Process

Monitoring daemon

Runs alongside Redis. Monitors master health, coordinates failover via quorum vote.

🗳️

Quorum

Consensus requirement

Minimum Sentinels that must agree master is down before failover starts. Typical: 2 of 3.

🔄

Leader Election

Raft-like vote

Sentinels elect a leader among themselves. Leader picks the best replica to promote.

📡

Client Discovery

Service locator

Clients ask Sentinel for current master address — Sentinel is the source of truth.

Sentinel Topology in 3 Availability Zones

ZONE A (primary)

🔴 redis-master-0 MASTER

🟣 sentinel-0 SENTINEL

ZONE B

🟡 redis-replica-1 REPLICA

🟣 sentinel-1 SENTINEL

ZONE C

🟡 redis-replica-2 REPLICA

🟣 sentinel-2 SENTINEL

Rule: Always deploy an odd number of Sentinels (3, 5, 7) across at least 3 zones. With 2 Sentinels, a single zone failure prevents quorum and blocks failover.

# redis-sentinel.conf
sentinel monitor mymaster redis-master-0.redis.svc 6379 2
# ^^ quorum = 2 (2 of 3 sentinels must agree)

sentinel down-after-milliseconds mymaster 5000
# ^^ master considered down after 5s of no PING reply

sentinel failover-timeout mymaster 60000
# ^^ failover must complete within 60s

sentinel parallel-syncs mymaster 1
# ^^ only 1 replica resyncs at a time during failover
  

3. Redis Cluster Mode — Sharding + HA

🗂️

Hash Slots

Data partitioning

16384 slots distributed across primary shards. Each key hashes to a slot: CRC16(key) % 16384.

🔗

Gossip Protocol

Cluster bus port 16379+

Nodes exchange health/slot info via gossip every second. Failure detected by majority of nodes.

⚡

Auto Failover

No Sentinel needed

Cluster built-in: when a primary fails, its replica is promoted automatically by cluster vote.

📦

Min 6 Pods

3 primary + 3 replica

Minimum viable cluster: 3 primaries (each owning ~5461 slots) each with 1 replica.

Cross-zone replica placement: Each primary's replica lives in a different zone. If Zone A fails, Primary 0's replica in Zone C is promoted — the cluster survives with 2 zones intact.

4. Kubernetes Patterns for Redis

StatefulSet — why not Deployment?

Stable Pod Names

redis-0, redis-1, redis-2 — always the same. Sentinel config can reference stable DNS.

Ordered Start/Stop

Redis 0 starts first (becomes master), replicas 1 & 2 start after and sync from master.

Persistent Volume per Pod

Each pod gets its own PVC via volumeClaimTemplates — data survives pod rescheduling.

Headless Service

clusterIP: None gives each pod a stable DNS: redis-0.redis.namespace.svc.cluster.local

Services Needed

Headless Service (StatefulSet DNS)

clusterIP: None — enables pod-to-pod DNS for replication and Sentinel discovery.

Master Service (for writes)

ClusterIP service pointing to current master. Updated dynamically by operator or Sentinel hook.

Replica Service (for reads)

Targets all replicas via label selector. Clients can distribute reads across replicas.

# Headless service
apiVersion: v1
kind: Service
metadata:
  name: redis-headless
spec:
  clusterIP: None   # headless!
  selector:
    app: redis
  ports:
  - port: 6379
      

Zone-Aware Scheduling with topologySpreadConstraints

# Force pods to spread across zones (K8s 1.19+)
topologySpreadConstraints:
- maxSkew: 1                         # max allowed imbalance between zones
  topologyKey: topology.kubernetes.io/zone
  whenUnsatisfiable: DoNotSchedule     # fail if can't spread
  labelSelector:
    matchLabels:
      app: redis

# Also use podAntiAffinity to keep master & replica in different zones
affinity:
  podAntiAffinity:
    preferredDuringSchedulingIgnoredDuringExecution:
    - weight: 100
      podAffinityTerm:
        topologyKey: topology.kubernetes.io/zone
        labelSelector:
          matchLabels:
            app: redis
  

5. Multi-Zone Data Sync — How It Works

Within a single cluster: Redis replication is intra-cluster async. Zones are just a scheduling concern — Redis doesn't know about zones. The K8s scheduler places pods across zones; Redis replicates data regardless.

Replication gap risk: The master ACKs the write before replicas confirm. If the master crashes between ACK and replication, the client thinks data was saved but replicas don't have it. Use WAIT numreplicas timeout for semi-sync semantics when durability matters.

# Semi-synchronous write: wait for at least 1 replica to confirm
SET mykey myvalue
WAIT 1 100   # wait for 1 replica, timeout 100ms
# Returns number of replicas that acknowledged. 0 = potential data loss on failover.
  

6. Replication Flow — FULLSYNC vs PSYNC

Replica sends PSYNC ? -1

Replica doesn't know master's replication ID yet. Asks for a full sync.

Master forks + creates RDB snapshot

Master forks a child process to dump current dataset to an RDB file. Parent continues serving clients.

Master sends FULLRESYNC + RDB

Sends replication ID and current offset, then streams the RDB file to the replica. New writes are buffered in replication backlog.

Replica loads RDB, then drains buffer

Replica flushes existing data, loads the RDB snapshot, then applies buffered commands that arrived during the RDB transfer.

Replica enters online replication mode

Now in sync. Master streams every write command to the replica in real-time.

Cost of FULLSYNC: Forks memory (doubles RAM usage momentarily), CPU spike for RDB serialization, network bandwidth for large datasets. Avoid by keeping replicas healthy to use PSYNC.

Master maintains replication backlog

A circular buffer (default 1MB, tune with repl-backlog-size) holds recent commands. If replica lag stays within backlog size, partial sync is possible.

Every write is sent as RESP command

Master sends a stream of Redis commands (SET, HSET, ZADD…) to replicas. Replica applies them in order.

Replica tracks replication offset

Both master and replica maintain the current byte offset. If they match, replica is fully caught up.

REPLCONF ACK sent periodically

Replica sends its current offset to master every second. Master uses this for WAIT command and lag monitoring.

# Monitor replication lag from master:
INFO replication
# Look for:
slave0:ip=10.0.0.5,port=6379,state=online,offset=1234567,lag=0
# lag=0 means replica is fully in sync
# lag=N means N seconds behind
    

Replica reconnects, sends PSYNC <replid> <offset>

Replica remembers the master's replication ID and its last known offset.

Master checks if offset is still in backlog

If the replica's offset falls within the backlog window → CONTINUE (partial sync). If the backlog was overwritten → FULLRESYNC.

Partial sync (best case)

Master only sends the missing commands. Fast, low bandwidth — typically what happens after a brief network blip.

Tune the backlog: repl-backlog-size 64mb — for high-write-rate clusters or replicas in distant zones with occasional network gaps, a larger backlog avoids expensive full resyncs.

7. Multi-Cluster & Geo-Distributed Sync

Important distinction: Redis built-in replication only works within a single master chain. Cross-cluster sync (different K8s clusters, different regions) requires additional tooling.

🌍

Redis Enterprise Geo-Replication

Active-Active (CRDT)

Conflict-free Replicated Data Types (CRDTs). Each region is a primary that accepts writes. Conflicts resolved automatically. Requires Redis Enterprise license.

🔀

RedisGears / KeyDB

OSS Active-Active

KeyDB (fork) supports multi-master async replication between clusters. Community alternative to Redis Enterprise for active-active.

📨

Application-Level Fan-Out

Write to all regions

App writes to multiple Redis clusters simultaneously. Simple but doubles write latency. Suitable for cache invalidation, not primary storage.

🔧

redis-shake / rump

OSS migration tools

Tools for one-way streaming replication between independent Redis instances/clusters. Used for migration, DR, and near-real-time cross-cluster sync.

8. K8s Operators for Redis

⚙️

Redis Operator (Spotahome)

OSS — github.com/spotahome

Manages Sentinel-based HA. Creates StatefulSets, Services, ConfigMaps automatically. Handles failover endpoint updates.

🎯

Redis Cluster Operator (OT-Container)

OSS — ot-container-kit

Manages Redis Cluster mode. Handles slot rebalancing, node addition/removal, TLS, password rotation.

🏢

Redis Enterprise Operator

Commercial — Redis Inc.

Full lifecycle management of Redis Enterprise clusters. Supports geo-replication, Active-Active, module management, and enterprise security features.

📦

Bitnami Redis Helm Chart

Most common starting point

Helm chart that deploys Sentinel HA or standalone. Not a true operator but widely used. Add sentinel.enabled=true for HA mode.

# Bitnami Redis HA with Sentinel via Helm
helm install redis bitnami/redis \
  --set architecture=replication \
  --set sentinel.enabled=true \
  --set sentinel.quorum=2 \
  --set replica.replicaCount=2 \
  --set global.redis.password=supersecret

# This creates:
#   redis-node-0  (master + sentinel)
#   redis-node-1  (replica + sentinel)
#   redis-node-2  (replica + sentinel)
#   redis         (service → current master)
#   redis-headless (headless DNS)
  

9. Comparison: Deployment Modes

Mode	HA?	Sharding?	Failover	Multi-Zone	Data Loss Risk	Complexity	Best For
Standalone	None	No	Manual	No	High	Low	Dev / ephemeral cache
Primary + Replica	Partial	No	Manual	With scheduling	Medium	Low	Read scaling + manual DR
Sentinel HA	Yes	No	Auto (~30s)	Yes (3 zones)	Low-medium	Medium	Single-shard HA, <100GB
Cluster Mode	Yes	Yes	Auto (<10s)	Yes (built-in)	Low-medium	High	Large datasets, >100GB, high throughput
Enterprise Geo	Yes	Yes	Auto	Multi-region	Very Low	Very High	Global apps, Active-Active

10. Sentinel Failover — Step-by-Step Walkthrough

Master pod crashes (Zone A fails)

redis-master-0 stops responding. All 3 Sentinels stop receiving PING replies from port 6379.

Subjective Down (SDOWN) — each Sentinel marks master

After down-after-milliseconds (default 5s), each Sentinel independently marks master as SDOWN (Subjectively Down).

Objective Down (ODOWN) — quorum reached

Sentinels exchange SENTINEL IS-MASTER-DOWN-BY-ADDR messages. Once quorum (e.g. 2 of 3) agree → master is ODOWN (Objectively Down).

Leader Sentinel election

Sentinels elect a leader via Raft-like vote (first Sentinel to request gets votes, majority wins). Leader coordinates the failover.

Best replica selected

Leader picks replica with: (1) lowest slave-priority, (2) highest replication offset (most up to date), (3) lowest run ID as tiebreaker.

SLAVEOF NO ONE — new master

Leader Sentinel sends SLAVEOF NO ONE to the chosen replica (e.g. redis-replica-1 in Zone B). It promotes itself to master.

Other replicas reconfigured

Sentinels send SLAVEOF new-master-ip 6379 to remaining replicas. They resync from the new master.

Clients notified + service updated

Sentinels publish +switch-master event. Smart clients (Jedis, StackExchange.Redis, go-redis) listening to Sentinel automatically reconnect to new master. Operators update K8s Service selector.

Old master demoted (if it comes back)

When the original master pod recovers, Sentinel reconfigures it as a replica of the new master. Data is re-synced via PSYNC.

Total failover time: Typically 30–60 seconds with default settings. Tune down-after-milliseconds lower (e.g. 2000ms) for faster detection, but beware false positives on network blips. Client reconnection adds another 1–5 seconds.

# Watch failover in real-time:
redis-cli -p 26379 SUBSCRIBE __sentinel__:hello
redis-cli -p 26379 PSUBSCRIBE "*"   # see all events including +switch-master

# Query current master from Sentinel:
redis-cli -p 26379 SENTINEL get-master-addr-by-name mymaster

# Check replica sync status on new master:
redis-cli INFO replication | grep -E "role|slave[0-9]|master_replid"