This handles the evaluation workflow for Google's Agent Development Kit, covering everything from writing evalsets to debugging why your agent scores tanked. The real value is in the eval-fix loop guidance: it walks you through the 5-10+ iteration cycle you'll actually go through, with a useful table calling out shortcuts that waste time (like lowering thresholds instead of fixing your agent). You get concrete metric selection advice (tool_trajectory_avg_score for CI/CD, final_response_match_v2 for semantic checks), schema examples for both evalsets and config files, and separate references for user simulation, multimodal inputs, and built-in tools. Use this when you need to systematically improve agent quality rather than just eyeballing outputs.
npx skills add https://github.com/google/agents-cli --skill google-agents-cli-eval