If you're building eval pipelines for LLM outputs, this is worth your time. It treats LLM-as-a-Judge as a family of techniques rather than one approach, which is the right mental model. You get patterns for picking the right evaluation method, mitigating judge biases, and correlating automated scores with human judgment. The skill synthesizes academic research with industry practice, so it's not just theory. Most useful when you're comparing model responses, debugging inconsistent evals, or setting up A/B tests for prompt changes. It's already seen 163 installs and passed security audits from three providers, which suggests people are actually using it in production.
npx skills add https://github.com/flora131/atomic --skill advanced-evaluation