This gives Claude direct control over Arize's experiment workflow through the ax CLI. It can create experiments against your datasets, export runs for analysis, and compare model outputs with evaluations like correctness or relevance. The skill enforces real API calls for every dataset example, no fake outputs or scores, which matters when you're actually benchmarking models. It handles both REST and Arrow Flight exports, automatically escalating to Flight when you hit the 500-run pagination limit. Good for A/B testing different prompts or models, running evals at scale, or pulling experiment data into your own analysis pipeline. Assumes you already have ax installed and an Arize profile configured.
npx skills add https://github.com/arize-ai/arize-skills --skill arize-experiment