When you've already identified what command to run in an AI research repo and just need clean execution evidence, this handles the boring parts. It runs your smoke test or documented inference command, captures the output, and writes standardized files to repro_outputs/ with proper success/failure classification. Also generates PATCHES.md when files get modified during execution. The narrow scope is intentional - it won't help you pick targets or set up environments, but it's reliable for the "just run it and document what happened" phase of reproduction work.
claude skill add lllllllama/ai-paper-reproduction-skill:minimal-run-and-audit