This is a production observability skill that treats telemetry as a QA signal, not an afterthought. It walks you through establishing the minimum bar (structured logs, traces, golden metrics), wiring OpenTelemetry instrumentation, verifying trace propagation across boundaries, and defining SLOs with burn-rate alerts instead of raw threshold alarms. Every test failure should capture a trace ID and correlated logs. The included templates cover Node and Python setup, Prometheus alert rules, Grafana dashboards, and k6 load tests. It's opinionated about redacting PII, budgeting cardinality, and gating releases on error budgets. Use this when you need observability that actually helps you diagnose failures, not just collect data.
npx skills add https://github.com/vasilyu1983/ai-agents-public --skill qa-observability