CLAUDE CODE MARKETPLACES
SkillsMarketplacesMCPDigestLearnAdvertise

This week in Claude

Every Monday: Claude Code, Agent SDK, MCP, and the Anthropic platform moves worth your time.

Skills by Category
Frontend DevelopmentBackend & APIsTesting & QASecurityDevOps & CI/CDGit & Pull RequestsDocumentationCode Review & QualityAI & Agent BuildingSkill Development
MCP Servers by Category
Web & Browser AutomationDatabasesAI & LLM ToolsCloud & InfrastructureCommunication & MessagingDeveloper ToolsDesign & CreativeDocuments & KnowledgeSearch & Web CrawlingAutomation & Workflows
Marketplaces by Category
AI Agents & OrchestrationLLM IntegrationDevelopment ToolsFrontend & UIBackend & APIsDatabasesTesting & Code QualityDevOps & CloudSecurity & ComplianceGit & Version Control

Claude Code Marketplaces

Discover Claude Code plugins, extensions, and tools. Automatically updated directory of Anthropic Claude AI marketplaces with development tools, productivity plugins, and integrations.

Resources

  • Browse Skills
  • Browse MCP Servers
  • Browse Marketplaces
  • Plugins Reference

Community

  • About
  • Learn
  • Feedback
  • Privacy Policy
  • Advertise

Built for the Claude Code community with Claude Code by @mertduzgun

Independent project, not affiliated with Anthropic
  1. Skills
  2. /
  3. rysweet
  4. /
  5. amplihack
  6. /
  7. Model Evaluation Benchmark

Model Evaluation Benchmark

Editor's Note

This automates the full cycle of running model comparison benchmarks following the Benchmark Suite V3 reference implementation. It executes tasks across different Claude models (Opus vs Sonnet), spins up reviewer agents to score code quality, tracks metrics like duration and tool calls, then generates a comprehensive report as a GitHub issue with archived artifacts. The mandatory cleanup phase closes all test PRs and issues, which is honestly the kind of housekeeping that's easy to forget when you're running benchmarks manually. Best for systematic model evaluations where you need reproducible results and proper documentation, not one-off performance checks.

Install

npx skills add https://github.com/rysweet/amplihack --skill model-evaluation-benchmark
Votes
0
Installs138
GitHub Stars61
Categories
Git & Pull Requests
First SeenJun 3, 2026
View on GitHub

Comments

Login to comment

Related Git & Pull Requests Skills

View all →
github-pr-review

fvadicamo/dev-agent-skills

0
491
63
github pr review
github-pr-merge

fvadicamo/dev-agent-skills

0
214
63
github pr merge
create-github-pull-request-from-specification

github/awesome-copilot

0
9k
34.3k
Automated GitHub pull request creation from specification templates with draft-to-review workflow.
github-issues

github/awesome-copilot

0
11.9k
34.3k
Create, update, and manage GitHub issues with full workflow support including types, labels, assignees, and dependencies.
create-github-issues-feature-from-implementation-plan

github/awesome-copilot

0
8.7k
34.3k
Create GitHub Issues automatically from implementation plan phases.
create-github-issue-feature-from-specification

github/awesome-copilot

0
8.6k
34.3k
Create GitHub issues from specification files using the feature_request.yml template.