This is Arize's open-source observability platform for LLM applications, built on OpenTelemetry. It traces calls across OpenAI, LangChain, LlamaIndex, and Anthropic, then gives you a UI to debug what happened. The evaluation framework is solid: you can run LLM-as-judge evaluators on datasets, compare experiments, and log results back to traces. It runs locally with SQLite or you can point it at Postgres for production. The instrumentation is automatic once you register the tracer provider, which is cleaner than manual logging. If you want observability without vendor lock-in or need to run everything on-premise, this is the move. The playground for testing prompts across models is a nice touch.
npx skills add https://github.com/orchestra-research/ai-research-skills --skill phoenix-observability