Runs a structured benchmark suite against the cost-booster routing logic and optionally compares it to Gemini, Sonnet, and Opus baselines. You'd fire this before cutting a release to confirm your win rate didn't regress, or when you add new test cases to the corpus and want to verify routing decisions. The benchmark takes about 85ms for the booster-only run and writes timestamped JSON results to a docs folder that other skills can read. The smoke gate expects at least 80% win rate on tier-1 cases. It's designed to turn "claimed upstream" tags into "verified" ones by giving you repeatable numbers you can point to in documentation.
npx skills add https://github.com/ruvnet/ruflo --skill cost-benchmark