This lets you attach LLM judges to AI Config variations for automatic quality scoring. The judges are specialized AI Configs that evaluate responses and return scores from 0.0 to 1.0, measuring things like accuracy, relevance, or toxicity. You can use the three built-in judges or create custom ones for domain-specific evaluation like security auditing or contract compliance. The sampling rate controls what percentage of responses get evaluated, which matters if you're running evals at scale. One gotcha: judges only work with completion mode configs through the UI, and you have to manually set the fallthrough variation since the normal targeting toggle doesn't work for AI Configs. Requires Python v0.18.0+ or Node v0.17.0+ for the consolidated judge result API.
npx skills add https://github.com/launchdarkly/agent-skills --skill aiconfig-online-evals