This is Ronny Kohavi's framework for running A/B tests that actually produce trustworthy results. It helps you design experiments with proper sample sizes, validate results before shipping, and avoid the false positives that plague most testing programs. The core insight is sobering: 66-92% of experiments fail to improve metrics, and with an 8% base success rate, a p-value of 0.05 still means 26% false positive risk. Use this when you're setting up experimentation infrastructure, diagnosing suspicious results, or need to make confident ship/no-ship decisions. Skip it if you don't have tens of thousands of users or need immediate answers.
npx skills add https://github.com/pmprompt/claude-plugin-product-management --skill trustworthy-experiments