A paid remote endpoint for running AI model benchmarks and collecting audit-ready evidence. Exposes four tools: run_benchmark_gate for CI-style pass/fail checks, compare_model_scores for head-to-head evaluations, read_benchmark_report for pulling structured results, and issue_benchmark_receipt for generating usage logs. Requires a bearer token from the product site and works over streamable HTTP transport. Reach for this when you need reproducible benchmark verdicts with receipts, especially if you're tracking model performance across releases or need compliance-friendly documentation of your evaluation runs. The server card and public MCP endpoint are live, but all production calls are metered and authenticated.
claude mcp add --transport http com.clauxel.evalscopebench-evalscopebench-mcp https://evalscopebench.clauxel.com/mcp