This is what you reach for when you're shipping code and realize you won't be able to tell if it's working once it's live. It walks you through instrumenting for observability: structured logging with correlation IDs, RED metrics (rate, errors, duration) for services, USE metrics (utilization, saturation, errors) for resources, and distributed tracing via OpenTelemetry. The core insight is solid: define the questions an on-call engineer will ask before you add any telemetry, then instrument to answer exactly those questions. It's opinionated about alert design too, insisting you page on user-facing symptoms, not CPU percentages. If you've ever spent an hour during an incident wishing you'd logged one more field, this prevents that.
npx -y skills add addyosmani/agent-skills --skill observability-and-instrumentation --agent claude-codeInstalls into .claude/skills of the current project.
Select a file.
sickn33/antigravity-awesome-skills
kubesphere/kubesphere
supercent-io/skills-template