CLAUDE CODE MARKETPLACES
SkillsMarketplacesMCPDigestLearnAdvertise

This week in Claude

Every Monday: Claude Code, Agent SDK, MCP, and the Anthropic platform moves worth your time.

Skills by Category
Frontend DevelopmentBackend & APIsTesting & QASecurityDevOps & CI/CDGit & Pull RequestsDocumentationCode Review & QualityAI & Agent BuildingSkill Development
MCP Servers by Category
Web & Browser AutomationDatabasesAI & LLM ToolsCloud & InfrastructureCommunication & MessagingDeveloper ToolsDesign & CreativeDocuments & KnowledgeSearch & Web CrawlingAutomation & Workflows
Marketplaces by Category
AI Agents & OrchestrationLLM IntegrationDevelopment ToolsFrontend & UIBackend & APIsDatabasesTesting & Code QualityDevOps & CloudSecurity & ComplianceGit & Version Control

Claude Code Marketplaces

Discover Claude Code plugins, extensions, and tools. Automatically updated directory of Anthropic Claude AI marketplaces with development tools, productivity plugins, and integrations.

Resources

  • Browse Skills
  • Browse MCP Servers
  • Browse Marketplaces
  • Plugins Reference

Community

  • About
  • Learn
  • Feedback
  • Privacy Policy
  • Advertise

Built for the Claude Code community with Claude Code by @mertduzgun

Independent project, not affiliated with Anthropic
  1. Skills
  2. /
  3. hamelsmu
  4. /
  5. evals-skills
  6. /
  7. Validate Evaluator

Validate Evaluator

Editor's Note

This walks you through calibrating an LLM judge against human labels using proper train/dev/test splits and TPR/TNR metrics. You'd use it after writing a judge prompt when you need to verify it actually agrees with human judgment before trusting it in production. The workflow is methodical: split your labeled data, iterate on the dev set until you hit 90% TPR and TNR, then measure once on the held-out test set. It includes the Rogan-Gladen bias correction formula for estimating true success rates from biased judge scores, plus bootstrap confidence intervals. The anti-pattern section is worth reading since most people skip validation entirely and just assume judges work.

Install

npx skills add https://github.com/hamelsmu/evals-skills --skill validate-evaluator
Votes
0
Installs289
GitHub Stars1.3k
First SeenJun 3, 2026
View on GitHub

Comments

Login to comment