Provides five information retrieval scoring tools for evaluating RAG system quality: recall@k and hit@k measure whether relevant documents appear in your top results, MRR scores the rank of the first correct match, NDCG@k weights graded relevance scores by position, and evaluate_batch runs all metrics on multiple queries at once. Complements the author's promptbudget, citecite, and ragdrift servers from the same mcp-stack collection. Install it when you need Claude to help you benchmark retriever performance against ground truth relevance judgments, either interactively or as part of a larger RAG development workflow. The logic is implemented in plain JavaScript so you can run it via npx without pulling in the Rust ragmetric crate.
claude mcp add --transport stdio io.github.mukundakatta-ragmetric-mcp -- npx -y @mukundakatta/ragmetric-mcp