Implements self-critique loops where Claude generates output, evaluates it against your criteria, then refines based on its own feedback. Includes evaluator-optimizer patterns, test-driven code refinement, and LLM-as-judge scoring with JSON-structured critiques. Most useful for quality-critical tasks like code generation, reports, or analysis where you have clear success metrics. The reflection patterns prevent single-shot mediocrity by forcing iterative improvement, though you'll want to set iteration limits to avoid endless loops. Works best when your evaluation criteria are specific rather than subjective.
npx skills add https://github.com/github/awesome-copilot --skill agentic-eval