Scaffolds evaluation suites for the Axiom AI SDK by generating eval files, scorers, flag schemas, and config from natural language descriptions. It reads your AI code first, traces inputs and outputs, then creates colocated .eval.ts files with at least two scorers per capability (correctness plus quality checks). The philosophy is solid: evals are tests for AI, scorers are assertions, flags are variables. It knows the difference between reference-based and reference-free scorers and will pick the right pattern based on whether you're returning categories, free text, structured objects, or tool calls. Use this when you need to prove your AI features still work after every change.
npx skills add https://github.com/axiomhq/skills --skill writing-evals