Aiconfig Online Evals

Editor's Note

This lets you attach LLM judges to AI Config variations for automatic quality scoring. The judges are specialized AI Configs that evaluate responses and return scores from 0.0 to 1.0, measuring things like accuracy, relevance, or toxicity. You can use the three built-in judges or create custom ones for domain-specific evaluation like security auditing or contract compliance. The sampling rate controls what percentage of responses get evaluated, which matters if you're running evals at scale. One gotcha: judges only work with completion mode configs through the UI, and you have to manually set the fallthrough variation since the normal targeting toggle doesn't work for AI Configs. Requires Python v0.18.0+ or Node v0.17.0+ for the consolidated judge result API.

Install

npx skills add https://github.com/launchdarkly/agent-skills --skill aiconfig-online-evals

Votes

Installs790

GitHub Stars10

Install

npx skills add https://github.com/launchdarkly/agent-skills --skill aiconfig-online-evals

Aiconfig Online Evals

Install

Aiconfig Online Evals

Install

Related Frontend Development Skills

Related Frontend Development Skills