When your Airflow pipeline breaks at 3am, you need more than just error logs. This walks you through a structured four-step investigation: identify the failure with `af runs diagnose`, pull task logs to find the actual exception buried under Airflow boilerplate, check context like recent deploys or data volume spikes, then output a diagnosis with root cause, impact assessment, and concrete remediation steps. It's designed for deep investigations when something is genuinely broken and you need to understand why, not just surface-level log checking. The systematic approach helps you avoid the usual debugging trap of jumping straight to fixes without understanding what actually went wrong or how to prevent it next time.
npx skills add https://github.com/astronomer/agents --skill debugging-dags