This is about building self-correcting workflows instead of hoping your prompts work on the first try. You define quality criteria with weights and thresholds, pick an evaluator type (rule-based, cross-model, human-in-the-loop), then set up a correction loop that feeds specific feedback back into retries. The key insight is that retries must include the evaluator's feedback, not just run the same input again. It includes regression detection against golden test sets and production monitoring patterns. If you're building anything where output quality matters and you need more than crossing your fingers, this gives you the scaffolding to make it converge reliably.
npx skills add https://github.com/sharpdeveye/maestro --skill iterate