This runs your vercel-plugin benchmark scenarios inside ephemeral Firecracker microVMs instead of local terminals. Each sandbox gets a fresh Claude Code install, runs a three-phase pipeline (build, verify with agent-browser, deploy to Vercel), and scores each phase with structured JSON via a separate Haiku pass. The architecture is battle-tested: snapshots preserve both files and npm globals, port 3000 maps to a public URL at creation time, and extendTimeout keeps sandboxes alive overnight if you need it. You get concurrency control, per-phase timeouts, and the ability to load scenarios from JSON instead of hardcoding them. The real win is isolating each eval run so one broken scenario doesn't poison your local environment, plus you can parallelize up to 10 at once.
npx skills add https://github.com/vercel-labs/vercel-plugin --skill benchmark-sandbox