This pulls together the full monitoring stack for production LLM apps: Prometheus metrics with the RED method, Grafana dashboards, and Langfuse v4 tracing with semantic span types and inline scoring. It covers 12 rules across infrastructure monitoring, LLM observability, drift detection, and silent failure detection. The drift detection is smart, using PSI for production scale and dynamic thresholds to cut alert fatigue. What stands out is the opinionated Langfuse-over-LangSmith stance and the focus on catching silent failures like tool skipping and quality degradation. Reach for this when you need to instrument an LLM system properly, not just log requests.
npx skills add https://github.com/yonatangross/orchestkit --skill monitoring-observability