This is the internal reference for how PluginEval measures skill quality across ten dimensions, from triggering accuracy to ecosystem coherence. It documents the three-layer scoring system (static analysis, LLM judge, Monte Carlo simulation), the weighted composite formula that produces final scores, and the badge thresholds from Bronze to Platinum. You'll want this when debugging why a skill scored poorly on orchestration fitness, when calibrating your own marketplace's quality bar, or when explaining to a partner why their integration earned a C on token efficiency. The blend weights table alone is worth bookmarking. Fair warning: this is dense reference material, not a tutorial.
npx -y skills add wshobson/agents --skill evaluation-methodology --agent claude-codeInstalls into .claude/skills of the current project.
Select a file.
juliusbrussee/caveman
mattpocock/skills
shadcn/improve
obra/superpowers
forrestchang/andrej-karpathy-skills
vercel-labs/skills