If you're implementing SRE practices and need to actually measure reliability, this walks you through the full SLI/SLO/error budget stack with Prometheus queries and alerting rules ready to go. It includes concrete examples like multi-window burn rate alerts (14.4x for fast burns, 6x for slow) and error budget policies that map remaining budget percentages to development velocity decisions. The availability calculations, latency percentiles, and Grafana dashboard structures are all here. It's opinionated about the Google SRE approach but practical. Good for teams moving beyond uptime checks into actual reliability engineering, though you'll need Prometheus already running.
npx skills add https://github.com/sickn33/antigravity-awesome-skills --skill slo-implementation