If you need to track performance over time and catch regressions before they hit production, this agent runs comprehensive benchmark suites across throughput, latency, scalability, and resource usage. It includes multiple regression detection methods (statistical, ML-based, threshold, and trend analysis) and can compare current results against historical baselines. The code shows it handles warmup/cooldown phases properly and can run tests either sequentially or in parallel. The regression detection uses CUSUM for change point analysis and trains anomaly models on historical data, which is more sophisticated than simple threshold checks. Worth using if you're doing continuous performance testing and want automated alerts when things degrade.
npx skills add https://github.com/ruvnet/ruflo --skill agent-benchmark-suite