SRE
All posts tagged with sre
$ cd /blog/
2026-04-11
|
17 min read
Eight microservices on EKS, CloudWatch logs with no trace IDs, and a 45-minute MTTD on every payment incident. We instrumented the full stack with OpenTelemetry Collector, Grafana Tempo, and auto-instrumentation — and found an N+1 query that had been adding 200ms to every payment for months.
2026-04-10
|
13 min read
We built an autonomous SRE agent that connects to Datadog, Kubernetes, AWS, and Cloudflare simultaneously — then gave it RAG access to every runbook, post-mortem, and line of source code the company ever wrote. MTTR dropped from 45 minutes to 8. Here's the architecture.
2026-03-28
|
5 min read
We built an AI agent that reads logs, correlates traces, and suggests fixes before the on-call engineer finishes their coffee. Here's exactly how we did it.