A solid pattern for when you need quality gates on implementation work. This spins up three agents: a meta-judge to write evaluation criteria, an implementation agent to do the actual work, and a judge to verify against those criteria. The orchestrator (you) just coordinates and never touches code directly, which keeps contexts clean. It runs meta-judge and implementation in parallel, then loops up to twice if the judge fails the work. The rubric generation before judging is smart since generic code review often misses task-specific requirements. Main tradeoff is token cost from running multiple agents, but you get consistent verification and feedback loops without polluting your main context with implementation details.
npx skills add https://github.com/neolabhq/context-engineering-kit --skill do-and-judge