If you're shipping AI agents to production and need to know whether they're leaking PII, hallucinating, or burning through your budget, this is the eval layer to add. It logs hierarchical trace trees with per-call latency and token costs to SQLite, runs 13 built-in safety and quality rules (SSN detection, prompt injection patterns, hedge phrase markers), and exposes nine MCP tools including LLM-as-judge scoring and semantic citation verification. Any MCP-compatible agent discovers it automatically once you add the npx command to your config. Flip on the dashboard flag to get a real-time web UI at localhost:6920 showing cost breakdowns and rule pass rates. Bring your own Anthropic or OpenAI key for the judge-based evals.
claude mcp add --transport stdio iris-eval-mcp-server -- npx -y @iris-eval/mcp-server