CLAUDE CODE MARKETPLACES
SkillsMarketplacesMCPDigestLearnAdvertise

This week in Claude

Weekly digest for Claude Code builders. Model updates, releases, and notable tools.

Skills by Category
Frontend DevelopmentBackend & APIsTesting & QASecurityDevOps & CI/CDGit & Pull RequestsDocumentationCode Review & QualityAI & Agent BuildingSkill Development
MCP Servers by Category
Web & Browser AutomationDatabasesAI & LLM ToolsCloud & InfrastructureCommunication & MessagingDeveloper ToolsDesign & CreativeDocuments & KnowledgeSearch & Web CrawlingAutomation & Workflows
Marketplaces by Category
AI Agents & OrchestrationLLM IntegrationDevelopment ToolsFrontend & UIBackend & APIsDatabasesTesting & Code QualityDevOps & CloudSecurity & ComplianceGit & Version Control

Claude Code Marketplaces

Discover Claude Code plugins, extensions, and tools. Automatically updated directory of Anthropic Claude AI marketplaces with development tools, productivity plugins, and integrations.

Resources

  • Browse Skills
  • Browse MCP Servers
  • Browse Marketplaces
  • Plugins Reference

Community

  • About
  • Learn
  • Feedback
  • Privacy Policy
  • Advertise

Built for the Claude Code community with Claude Code by @mertduzgun

Independent project, not affiliated with Anthropic
  1. Skills
  2. /
  3. wshobson
  4. /
  5. agents
  6. /
  7. Llm Evaluation

Llm Evaluation

Editor's Note

You know those moments when you deploy an LLM change and wonder if you just made things better or worse? This helps you actually measure that with real numbers. It sets up automated scoring using metrics like BLEU, ROUGE, and BERTScore, plus LLM-as-judge patterns where you use Claude to evaluate outputs for quality, accuracy, and safety. You get comparison frameworks for A/B testing different prompts, groundedness checks against source material, and toxicity detection. Really useful when you're iterating on prompts, comparing models, or need to catch regressions before they hit production. Beats guessing based on a few cherry-picked examples.

Install

npx skills add https://github.com/wshobson/agents --skill llm-evaluation
Votes
0
Installs5k
GitHub Stars33.7k
Categories
Testing & QADevOps & CI/CDGit & Pull RequestsCode Review & QualityAI & Agent BuildingRelease ManagementAutomation & Workflows
View on GitHub

Comments

Login to comment

Related Testing & QA Skills

View all →
test-driven-development

obra/superpowers

0
52k
154.2k
webapp-testing

anthropics/skills

1
49k
118.1k
ab-test-setup

coreyhaines31/marketingskills

0
36.2k
21.4k
playwright-best-practices

currents-dev/playwright-best-practices-skill

0
27.5k
231
playwright-cli

microsoft/playwright-cli

0
20.9k
8.4k
seo

addyosmani/web-quality-skills

0
15.5k
1.8k