CLAUDE CODE MARKETPLACES
SkillsMarketplacesMCPDigestLearnJobsAdvertise

This week in Claude

Every Monday: Claude Code, Agent SDK, MCP, and the Anthropic platform moves worth your time.

Skills by Category
Frontend DevelopmentBackend & APIsTesting & QASecurityDevOps & CI/CDGit & Pull RequestsDocumentationCode Review & QualityAI & Agent BuildingSkill Development
MCP Servers by Category
Web & Browser AutomationDatabasesAI & LLM ToolsCloud & InfrastructureCommunication & MessagingDeveloper ToolsDesign & CreativeDocuments & KnowledgeSearch & Web CrawlingAutomation & Workflows
Marketplaces by Category
AI Agents & OrchestrationLLM IntegrationDevelopment ToolsFrontend & UIBackend & APIsDatabasesTesting & Code QualityDevOps & CloudSecurity & ComplianceGit & Version Control

Claude Code Marketplaces

Discover Claude Code plugins, extensions, and tools. Automatically updated directory of Anthropic Claude AI marketplaces with development tools, productivity plugins, and integrations.

Resources

  • Browse Skills
  • Browse MCP Servers
  • Browse Marketplaces
  • Plugins Reference

Community

  • About
  • Learn
  • Feedback
  • Privacy Policy
  • Jobs
  • Advertise

Built for the Claude Code community with Claude Code by @mertduzgun

Independent project, not affiliated with Anthropic
  1. Skills
  2. /
  3. sickn33
  4. /
  5. antigravity-awesome-skills
  6. /
  7. Agent Evaluation

Agent Evaluation

Editor's Note

This is the framework you need when your agent works great in demos but falls apart in production. It gives you statistical evaluation patterns that run tests multiple times to catch stochastic behavior, plus behavioral contract testing to enforce hard boundaries on what agents can and cannot do. The statistical evaluator calculates pass rates with confidence intervals, tracks behavior consistency across runs, and flags concerns like high variance or unstable outputs. The behavioral contract pattern is especially useful for production agents where you need guarantees about tone, scope, or safety. Remember that even top agents score under 50% on real world benchmarks, so this focuses on the metrics that actually matter for reliability.

Install

npx skills add https://github.com/sickn33/antigravity-awesome-skills --skill agent-evaluation
Votes
0
Installs605
GitHub Stars37.6k
Categories
Testing & QADocumentationAI & Agent BuildingProductivity & PlanningDesign & UI/UX
First SeenMay 16, 2026
View on GitHub

Comments

Login to comment

Related Testing & QA Skills

View all →
find-skills

vercel-labs/skills

5
1.5M
18.6k
Discover and install specialized agent skills from the open ecosystem when users need extended capabilities.
remotion-best-practices

remotion-dev/skills

0
312.3k
3.2k
Domain-specific knowledge base for building videos with Remotion and React.
skill-creator

anthropics/skills

0
210.7k
135.1k
Create, test, and iteratively improve AI agent skills with structured evaluation and benchmarking.
grill-me

mattpocock/skills

0
150.2k
85.4k
Relentless interviewing skill that stress-tests plans and designs through systematic questioning.
improve-codebase-architecture

mattpocock/skills

0
114.2k
85.4k
Analyze codebases for architectural friction and propose module-deepening refactors as testability improvements.
tdd

mattpocock/skills

0
111.6k
85.4k
Test-driven development with vertical slices, behavior-focused tests, and incremental red-green-refactor cycles.