This is an overnight CI pipeline for the vercel-plugin project. It spins up nine realistic projects (recipe platform, trivia game, code review bot), exercises skill injection through claude --print, verifies dev servers actually boot, then diffs which skills were expected versus which actually got injected. The output is a machine-readable report with copy-pasteable YAML fixes for missing patterns. Run it in a loop with sleep 3600 between iterations and you get a self-improvement cycle that shows exactly which gaps closed since the last run. Honestly feels like internal tooling that escaped into the wild, but if you're building a Claude plugin with similar injection logic, the contracts (run-manifest.json, events.jsonl) are worth stealing.
npx skills add https://github.com/vercel-labs/vercel-plugin --skill benchmark-e2e