If you're training large language models at scale, this gives you access to NVIDIA's Megatron-Core framework right from Claude. It handles everything from 2B to 462B parameter models and claims up to 47% MFU on H100s, which is legitimately impressive if you're burning that kind of compute. The setup is straightforward with Docker or pip, and it includes distributed training examples to get you started with multi-GPU setups. Honestly, this is overkill unless you're actually training foundation models, but if you are, having the boilerplate and parallelism strategies templated out saves you from reinventing NVIDIA's wheel. The skill wraps their official tooling, so you're getting battle-tested infrastructure rather than someone's weekend project.
npx skills add https://github.com/davila7/claude-code-templates --skill training-llms-megatron