Monitoring Blueprint

Complete Observability Stack

RedQueen is part of a comprehensive monitoring infrastructure. Built on Prometheus, OpenSearch, and Grafana — deployed through Terraform and managed via Flux CD.

Monitoring Architecture

How data flows through the observability stack.

Stack Components

Industry-standard tools, custom-configured for your needs.

Prometheus

Metrics collection and alerting

  • 30s scrape interval
  • 10-year retention
  • 100+ custom alerts
  • kube-prometheus-stack

OpenSearch

Log aggregation and search

  • App, ALB, WAF logs
  • Infrastructure logs
  • Full-text search
  • Dashboards UI

Grafana

Visualization and dashboards

  • Pre-built dashboards
  • OpenSearch datasource
  • Anonymous access
  • Alert visualization

100+ Alert Rules

Comprehensive alerting across all layers of your infrastructure.

OOM & Resources

ContainerOOMKilled
KubeContainerOOM
HighCPUUsage
HighMemoryUsage

Response Time

ResponseTime90thPercentile
AverageResponseTimeHigh
RequestLatencySpike

Error Rates

5xxErrorRateHigh
4xxErrorRateHigh
ErrorBudgetBurn

Saturation

RequestSaturation
RequestCountAnomaly
PodReplicasTooLow

Kubernetes

PodCrashLoopBackOff
PodNotReady
DeploymentReplicasMismatch
NodeNotReady

Infrastructure

DiskSpaceLow
NetworkLatencyHigh
CertificateExpiring

Log Indexes

Structured log storage in OpenSearch.

Application Logs

{env}-{cluster}-app-*

stdout/stderr from all pods

ALB Logs

{env}-{cluster}-alb-logs-*

Load balancer access logs

WAF Logs

{env}-{cluster}-waf-logs-*

Web Application Firewall events

Infrastructure Logs

{env}-{cluster}-infra-*

Kubernetes system components

Alert Routing

Different alerts go to different destinations.

OOM AlertsSlack #ops
Performance Alerts (RT-tagged)SNS → RedQueen → Slack
General AlertsSlack #alerts
Cost AlertsSNS → Email