If you're running human review loops on LLM outputs in Arize, this handles the setup work. You define label schemas (categorical like correct/incorrect, continuous scores, or freeform text) and create queues that route spans or dataset examples to reviewers. The CLI covers the full CRUD cycle for both configs and queues, plus you can bulk annotate spans through the Python SDK. It's the bridge between your traces and the people who need to label them. The docs are thorough on the schema options and queue assignment logic, which matters when you're coordinating multiple reviewers across different label types.
npx skills add https://github.com/arize-ai/arize-skills --skill arize-annotation