Observability Stack: LGTM + Prometheus

Tool	Signal Type	Data Ingestion	Query Language	Storage Backend	Key Role
Prometheus	Metrics	Pull (scrape)	PromQL	Local TSDB (short-term)	Scraper, local metrics store, alerting engine
Mimir	Metrics	Push (remote_write)	PromQL	Object storage (S3/GCS)	Long-term scalable metrics storage, multi-tenant
Loki	Logs	Push (agents)	LogQL	Object storage (S3/GCS)	Log aggregation, label-based indexing
Tempo	Traces	Push (OTLP/Jaeger)	TraceQL	Object storage (S3/GCS)	Distributed trace storage and search
Grafana	All signals	Query (read-only)	Per data source	None — queries backends	Visualization, dashboards, alerting UI, explore

Step 0 — Add Helm Repositories

helm repo add grafana             https://grafana.github.io/helm-charts
helm repo add prometheus-community https://prometheus-community.github.io/helm-charts
helm repo update

All five components live in two repos. Run helm search repo grafana to see available charts and versions.

Deployment Modes (Loki, Mimir, Tempo)

DEV / SMALL

Monolithic

All components run as a single process. Simplest setup. Good up to ~20 GB logs/day. One Deployment, one PVC. Minimal resource overhead.

helm install loki grafana/loki \
  --set deploymentMode=SingleBinary \
  --set loki.commonConfig.replication_factor=1

MEDIUM

Simple Scalable (SSD)

Splits into read, write, and backend targets. Each scales independently. Up to ~1 TB logs/day. Being deprecated before Loki 4.0 — skip for new setups.

helm install loki grafana/loki \
  --set deploymentMode=SimpleScalable

PRODUCTION

Microservices / Distributed

Each component (distributor, ingester, querier…) is a separate Deployment or StatefulSet. Fine-grained scaling. Recommended by Grafana Labs for production.

helm install loki grafana/loki \
  --set deploymentMode=Distributed

Same pattern applies to Mimir and Tempo — use grafana/mimir-distributed and grafana/tempo-distributed for production deployments. They share the same microservices philosophy.

Kubernetes Resources per Component

Component	K8s Kind	Typical Replicas	CPU Request	Memory Request	Notes
Prometheus	StatefulSet	1–2	500m	2 Gi	Created via PrometheusOperator CR
Node Exporter	DaemonSet	1 per node	100m	128 Mi	Included in kube-prometheus-stack
Loki Distributor	Deployment	2+	250m–500m	512 Mi–1 Gi	Receives & validates log writes
Loki Ingester	StatefulSet	3+	500m–1000m	2–4 Gi	In-memory buffer; needs PVC for WAL
Loki Querier	Deployment	2+	500m	1 Gi	Reads from object storage
Loki Compactor	StatefulSet	1	200m	512 Mi	Deduplicates & manages retention
Mimir Distributor	Deployment	2–12	2	4 Gi	No CPU limit by design (avoid throttle)
Mimir Ingester	StatefulSet	3–6+	2	4 Gi (limit 12 Gi)	Needs SSD PVC; core scaling unit
Mimir Store-Gateway	StatefulSet	3+	2	4 Gi	Serves historical data from object store
Tempo Distributor	Deployment	2+	200m	256 Mi	Receives spans from collectors
Tempo Ingester	StatefulSet	3+	500m	512 Mi	WAL on disk; SSD preferred
Grafana	Deployment	1–2	250m	512 Mi	Stateless UI; dashboards in ConfigMaps
Grafana Alloy	DaemonSet	1 per node	200m	256 Mi	Replaces Promtail + OTel Agent

Object Storage Setup

🧪 Local Dev — MinIO

Deploy MinIO as a StatefulSet in the cluster
S3 API on port 9000, Console on 9001
Create buckets: loki, mimir, tempo
Store credentials in a Kubernetes Secret
Use helm install minio minio/minio or the Grafana bundled option

helm install minio minio/minio \
  --set rootUser=admin \
  --set rootPassword=password123 \
  --set buckets[0].name=loki \
  --set buckets[1].name=mimir \
  --set buckets[2].name=tempo

☁️ Production — Cloud Object Storage

AWS S3: Use IAM role via IRSA (pod identity) — no static keys
GCS: Workload Identity + service account annotation
Azure Blob: Managed Identity or connection string in Secret
One bucket per component is the recommended pattern
Enable versioning and lifecycle policies for retention management

# Example Loki values.yaml for S3
loki:
  storage:
    type: s3
    s3:
      region: us-east-1
      bucketnames: my-loki-bucket
      s3ForcePathStyle: false

Ports Reference

🔥 Prometheus

HTTP / PromQL API9090

Node Exporter9100

AlertManager9093

📋 Loki

HTTP API / Push3100

gRPC9095

Memberlist (gossip)7946

☁️ Mimir

HTTP API8080

gRPC9095

Memberlist (gossip)7946

🌐 Tempo

HTTP API3200

OTLP gRPC4317

OTLP HTTP4318

Jaeger gRPC14250

Zipkin HTTP9411

Memberlist (gossip)7946

📈 Grafana

HTTP UI3000

🗄️ MinIO

S3 API9000

Console UI9001

Grafana Alloy — The One Agent to Rule Them All

Deploy Grafana Alloy as a DaemonSet so one agent pod runs on every node. It replaces Promtail, OTel Agent, and Prometheus node-level scraping in one binary.

What Alloy collects on each node

alloy (DaemonSet on every node)
  ├─ /var/log/pods/**  → Loki    :3100
  ├─ cAdvisor metrics  → Mimir   :8080
  ├─ node /metrics     → Mimir   :8080
  └─ OTLP spans recv   → Tempo   :4317

Install via Helm

helm install alloy grafana/alloy \
  -f alloy-values.yaml

# alloy-values.yaml
controller:
  type: daemonset
alloy:
  configMap:
    content: |
      loki.write "default" {
        endpoint { url = "http://loki:3100" }
      }

HostPath mounts (/var/log, /var/lib/docker) give Alloy direct access to container logs without any log driver changes. Alloy also handles back-pressure from Loki — if the ingester is slow, Alloy queues and retries instead of dropping.

Recommended Install Order

Namespace + Object Storage

Create monitoring namespace. Deploy MinIO (dev) or configure cloud bucket credentials as Secrets. All components need storage first.

kube-prometheus-stack (Prometheus + Grafana + AlertManager)

Installs Prometheus Operator, Prometheus, AlertManager, and Grafana in one chart. Includes Node Exporter DaemonSet and kube-state-metrics. This is your metrics foundation.

helm install kube-prom prometheus-community/kube-prometheus-stack \
  -n monitoring \
  --set prometheus.prometheusSpec.remoteWrite[0].url=http://mimir:8080/api/v1/push

Mimir

Deploy distributed Mimir. Point Prometheus remote_write at Mimir's distributor service. Mimir becomes the long-term metrics backend.

helm install mimir grafana/mimir-distributed \
  -n monitoring -f mimir-values.yaml

Loki

Deploy Loki in distributed mode. Loki ingesters form their own hash ring via memberlist on port 7946.

helm install loki grafana/loki \
  -n monitoring -f loki-values.yaml

Tempo

Deploy Tempo distributed. Tempo listens on OTLP gRPC :4317 and HTTP :4318. Stores traces to object storage.

helm install tempo grafana/tempo-distributed \
  -n monitoring -f tempo-values.yaml

Grafana Alloy (DaemonSet)

Deploy Alloy last so it can reach Loki, Mimir, and Tempo endpoints. Configure it to tail pod logs, scrape node metrics, and forward OTLP spans.

helm install alloy grafana/alloy \
  -n monitoring --set controller.type=daemonset

Configure Grafana Data Sources

Add Prometheus/Mimir, Loki, and Tempo as data sources in Grafana. Use provisioning ConfigMaps to automate this in GitOps workflows.

# datasources.yaml (provisioned via ConfigMap)
datasources:
  - name: Loki
    type: loki
    url: http://loki-gateway:3100
  - name: Mimir
    type: prometheus
    url: http://mimir-nginx:8080/prometheus
  - name: Tempo
    type: tempo
    url: http://tempo-query-frontend:3200

What is the Hash Ring?

Loki, Mimir, and Tempo are all distributed systems. When you have 6 ingester pods, how does a distributor know which ingester should receive a given log stream or metric time series? The answer is the consistent hash ring.

The ring is a circular 32-bit integer space (0 → 4,294,967,295). Each ingester instance claims a set of tokens (random points) on this ring. When data arrives, its labels/tenant/ID are hashed to a position on the ring, and the data is routed to the ingester that owns the nearest token clockwise.

Each ingester owns multiple tokens (shown as colored dots). Data hashes to a position; the nearest clockwise token wins. With RF=3, the next 2 ingesters also receive a copy.

Write Path Through the Ring

Distributor receives write

A log line arrives at the Loki distributor (or a metric sample arrives at the Mimir distributor). The distributor validates limits and computes the hash of the stream's label set (or metric labels).

Ring lookup → N ingesters

The hash maps to a position on the ring. The distributor finds the next N ingesters clockwise (N = replication factor, default 3). These are the target ingesters for this write.

Parallel write to all N ingesters

The distributor sends the write in parallel to all 3 ingesters simultaneously (Dynamo-style). It does NOT wait for all 3 — just for quorum.

Quorum achieved → success

With RF=3, quorum = floor(3/2)+1 = 2. As soon as 2 ingesters confirm the write, the distributor returns success. The 3rd ingester writes in the background. One ingester can be down without affecting writes.

Rings in Each Component

📋 Loki Rings

Ingester Ring — routes log stream writes from distributors to the correct ingesters based on stream label hash.

Distributor Ring — distributors track each other for HA write deduplication.

Compactor Ring — shards compaction jobs so only one compactor owns a given chunk.

UI: /distributor/ring, /ingester/ring

☁️ Mimir Rings

Ingester Ring — routes metric time series writes. Hash of metric labels determines target ingesters.

Store-Gateway Ring — shards which historical blocks each store-gateway instance serves from object storage.

Compactor Ring — coordinates block compaction to avoid race conditions.

Ruler Ring — distributes alert rule evaluation across ruler replicas.

-ingester.ring.* | -store-gateway.sharding-ring.*

🌐 Tempo Rings

Ingester Ring — routes trace spans by Trace ID hash. All spans of one trace land on the same ingester.

Distributor Ring — distributor coordination.

Compactor Ring — shard block compaction work.

Metrics-Generator Ring — optional; shards span-to-metrics derivation.

UI: /ingester/ring, /compactor/ring

Memberlist — The Gossip Protocol Behind the Ring

The ring state (who owns which tokens) must be shared across all instances. Grafana uses memberlist — a gossip protocol — to propagate this information without any central coordinator. Instances gossip with random peers, so information spreads exponentially fast.

Gossip messages

JOIN    → "I exist, here are my tokens"
LEAVE   → "I'm shutting down gracefully"
PING    → heartbeat every few seconds
UPDATE  → "my token set changed"

# Propagation:
Differential gossip  → sends only recent diffs
                        to random subset of peers
Pull-push sync       → full state exchange
                        with one random peer
                        ensures convergence

Kubernetes config

# All components use port 7946
# They discover peers via:
1. DNS SRV lookups
   (Headless Service → pod IPs)
2. Pod label selector
   memberlist.join = pod:// ...
3. Static IPs (simple setups)

# In values.yaml:
loki:
  memberlist:
    service:
      publishNotReadyAddresses: true

Why port 7946? All three components (Loki, Mimir, Tempo) use port 7946/TCP for memberlist by default. In Kubernetes, you need a headless Service exposing this port so pods can discover each other for ring formation.

What Happens When a Pod Joins or Leaves?

+ New Ingester Joins (scale-up)

Pod starts, generates random token values
Registers tokens in the ring via memberlist JOIN
State: JOINING → ACTIVE
Distributors update their ring copy and begin routing some writes to the new pod
Only ~1/N fraction of data rebalances (consistent hashing advantage)
WAL segments stream to new owner for in-flight data

− Ingester Leaves (scale-down or crash)

Graceful: pod enters LEAVING state, flushes chunks to object store, then exits
Crash: other pods stop receiving heartbeats; after heartbeat_timeout the instance is marked UNHEALTHY
Distributors reroute writes to the remaining ring members
Replication factor ensures data is not lost (quorum copies exist)
Compactor eventually reconciles any inconsistencies from object storage

Ring Instance States

JOINING

Pod is starting up. Tokens registered but not yet ready to serve reads. Other members are aware it exists.

ACTIVE

Fully operational. Receives writes and serves reads. Normal healthy state.

LEAVING

Pod is shutting down gracefully. Flushing in-memory data to object storage before removing tokens from ring.

UNHEALTHY

Pod stopped heartbeating. Marked dead after heartbeat_timeout. Traffic rerouted to healthy replicas.

Monitor ring health by visiting the component's HTTP UI at /ingester/ring, /distributor/ring, or /compactor/ring. You'll see each instance, its state, last heartbeat, and the tokens it owns. In Kubernetes: kubectl port-forward svc/loki-ingester 3100 -n monitoring then browse to localhost:3100/ingester/ring.

Replication Factor & Quorum Math

Formula

Quorum = floor(RF / 2) + 1

RF = 1 → Quorum = 1  (no fault tolerance)
RF = 2 → Quorum = 2  (both must succeed)
RF = 3 → Quorum = 2  (1 failure tolerated) ✓
RF = 5 → Quorum = 3  (2 failures tolerated)

Config

# Loki values.yaml
loki:
  commonConfig:
    replication_factor: 3

# Mimir values.yaml
mimir:
  ingester:
    ring:
      replication_factor: 3

RF=3 is the production default — it means you can lose 1 ingester pod with zero data loss and zero write interruption. Your StatefulSet replicas must be ≥ RF. For RF=3, run at least 3 ingester replicas spread across availability zones.

KV Store Backends for the Ring

RECOMMENDED
Memberlist (gossip)

No external dependencies — self-contained
Decentralized peer-to-peer discovery
Eventually consistent (converges quickly)
Default in all Helm charts
Works well for single-cluster deployments

-ring.store=memberlist

ALTERNATIVE
etcd

External etcd cluster required
Strong consistency (linearizable reads)
Good for multi-cluster or federated setups
More operational overhead
Can become a bottleneck at very high churn

-ring.store=etcd
-etcd.endpoints=etcd:2379

LEGACY
Consul

Supported for backwards compatibility
External Consul cluster required
Grafana Labs migrated away from Consul → memberlist in 2022
Use memberlist for new deployments

-ring.store=consul
-consul.hostname=consul:8500

TL;DR: Use memberlist. Grafana Labs themselves migrated from Consul to memberlist in production. It requires no extra infrastructure and handles pod churn in Kubernetes naturally via DNS-based peer discovery.

The Three Pillars of Observability

Metrics

Logs

Traces

Components At a Glance

Grafana

Prometheus

Mimir

Loki

Tempo

Architecture Diagram

Data Collectors (How Data Gets In)

🔥 Metrics → Prometheus

📋 Logs → Loki

🌐 Traces → Tempo

☁️ Metrics → Mimir

End-to-End Data Flow

Application Instrumentation

Prometheus Scrapes Metrics

Prometheus Remote-Writes to Mimir

Promtail / Alloy Ships Logs to Loki

OTel Collector Sends Traces to Tempo

Grafana Queries Everything

Cross-Signal Correlation (The Magic)

Quick Reference Comparison

Prometheus vs. Mimir — When to Use Which?

🔥 Use Prometheus alone when…

☁️ Add Mimir when…

Cross-Signal Correlation in Grafana

📊→📄 Metrics to Logs

📄→🔗 Logs to Traces

🔗→📊 Traces to Metrics (RED)

Signal Volume at High Throughput

Estimated signal volume generated per second

Bottleneck Analysis by Component

📋 Loki — Most Likely to Break First

Why it bottlenecks

Mitigations

🌐 OTel Collector — Single Point of Congestion

Why it bottlenecks

Mitigations

🔥 Prometheus — Cardinality at Scale

Why it bottlenecks

Mitigations

🌐 Tempo — WAL & Object Storage Write Throughput

Why it bottlenecks

Mitigations

📡 Log Collectors (Promtail / Alloy) — Disk I/O & Network

Why it bottlenecks

Mitigations

Bottleneck Priority Order

Recommended Actions by Scale

Baseline — Default config is fine

Scaling begins — First optimizations needed

High traffic — Horizontal scaling required

Extreme scale — Architecture shift

Key Architecture Shift at High Scale: Grafana Alloy

Step 0 — Add Helm Repositories

Deployment Modes (Loki, Mimir, Tempo)

Monolithic

Simple Scalable (SSD)

Microservices / Distributed

Kubernetes Resources per Component

Object Storage Setup

🧪 Local Dev — MinIO

☁️ Production — Cloud Object Storage

Ports Reference

🔥 Prometheus

📋 Loki

☁️ Mimir

🌐 Tempo

📈 Grafana

🗄️ MinIO

Grafana Alloy — The One Agent to Rule Them All

Recommended Install Order

Namespace + Object Storage

kube-prometheus-stack (Prometheus + Grafana + AlertManager)

Mimir

Loki

Tempo

RECOMMENDED
Memberlist (gossip)

ALTERNATIVE
etcd

LEGACY
Consul