This handles the full A/B testing workflow from hypothesis formation through statistical analysis. It calculates sample sizes using proper power analysis, provides reference tables for common conversion rates and minimum detectable effects, and catches common mistakes like peeking at results early or running underpowered tests. The hypothesis framework is solid, forcing you to articulate the behavioral reasoning behind each test rather than just throwing variants at the wall. It also does test prioritization using ICE scoring and checks for sample ratio mismatch, which most teams ignore until their results are garbage. If you're running conversion experiments beyond just gut feel changes, this gives you the statistical rigor to actually know if something worked.
npx skills add https://github.com/openclaudia/openclaudia-skills --skill ab-test-setup