Structures your ML experiments into four progressive stages borrowed from AI-Scientist-v2: get something working, tune the baseline, try creative ideas, then run ablations. It generates experiment plans with baselines, datasets, hyperparameter grids, and metrics in a JSON format you can actually execute against. The staged approach is smart because it stops you from tweaking architectures before you have stable training curves, which is how most research projects waste their first two weeks. Includes a Python script that outputs either JSON or markdown, and it enforces multi-seed runs for significance testing. Honestly most useful as a planning framework to keep experiments systematic rather than as automation, since research rarely follows a script perfectly.
npx skills add https://github.com/lingzhi227/agent-research-skills --skill experiment-design