This implements a two-phase evaluation pattern where a meta-judge first generates tailored scoring rubrics, then a separate judge agent applies them to assess your work. The key trick is context isolation: the judge gets only the relevant artifacts and criteria, not your entire conversation history, which helps prevent confirmation bias. Every score requires specific evidence with file paths and line numbers. You get a structured YAML report with weighted scores across multiple dimensions, self-verification notes, and a verdict ranging from "excellent" to "insufficient." It's report-only, so nothing changes automatically. Useful when you want rigorous feedback on code, docs, or configs you just built, especially if you've been deep in a long session and want fresh eyes.
npx skills add https://github.com/neolabhq/context-engineering-kit --skill judge