This implements multi-agent debate for code evaluation. It spins up a meta-judge to create scoring rubrics, then three independent judges (all Opus) analyze your solution in parallel. They debate their assessments over up to three rounds, defending scores with evidence and revising when convinced by counter-arguments. The whole thing runs through filesystem coordination with reports in .specs/reports. It's serious overkill for most code reviews, but if you need rigorous evaluation of critical components or you're comparing complex architectural approaches, the structured conflict actually surfaces issues single-pass reviews miss. Think of it as forcing your code through peer review by three specialists who have to publicly justify their positions.
npx skills add https://github.com/neolabhq/context-engineering-kit --skill judge-with-debate