Sets up and runs LLM evaluations with Promptfoo, an open-source CLI for testing prompts across different models. You'll reach for this when you need to compare Claude and GPT outputs side by side, write custom Python assertions for specific quality checks, or use LLM-as-judge scoring with rubrics. The skill covers the whole workflow: creating promptfooconfig.yaml, managing test cases with variable injection, implementing few-shot examples in chat format, and handling the gotchas like maxConcurrency placement and file path resolution. One thing to watch: if you're running through a relay API, every llm-rubric assertion needs its own provider config with apiBaseUrl or you'll hit 401 errors.
npx skills add https://github.com/daymade/claude-code-skills --skill promptfoo-evaluation