Adds consensus measurement tools to Claude using Fleiss' kappa and bootstrap confidence intervals to check if AI models agree with themselves or each other. The eight MCP tools let you run multi-model evaluations across Bedrock, OpenAI, and Gemini, generate statistical reports, compare runs over time, and estimate costs before executing. The self-consistency mode is handy because it uses MCP Sampling to test the host model without external API keys. You'd reach for this when you need statistically rigorous validation that an AI is giving consistent answers, especially for high-stakes applications where agreement matters more than speed. Includes schema validation and AI-powered schema suggestion from your data.
claude mcp add --transport stdio alligatorc0der-conkurrence -- npx -y conkurrence