This is for teams running LLMs in production who want to make their prompts better based on real usage data. It connects to Arize's observability platform, pulls your production traces to find where prompts live (usually in llm.input_messages spans), grabs whatever performance signal you have like eval scores or human annotations, then runs an optimization loop using the ax CLI. The workflow is extract current prompt from spans, gather feedback data from evals or datasets, then iterate. If you're already logging OpenInference traces and have evaluation data flowing, this gives you a structured way to tune prompts with actual evidence instead of guessing. Requires the ax CLI and Arize credentials set up.
npx skills add https://github.com/arize-ai/arize-skills --skill arize-prompt-optimization