This walks you through the mechanics of setting up A/B tests that actually reach statistical significance. It enforces the hypothesis structure (if we change X, then Y will improve because Z), helps you pick one primary metric instead of ten, and calculates sample sizes based on minimum detectable effect and power levels. The best part is the pitfalls section, which calls out the stuff everyone does wrong like peeking early or testing too many variants at once. It won't make you a statistician, but it will stop you from shipping changes based on noise. Most useful when you have real traffic and need to justify design decisions with data instead of opinions.
npx skills add https://github.com/owl-listener/designer-skills --skill a-b-test-design