This is NVIDIA's Megatron-Core wrapped as a Claude skill for training massive language models at scale. It handles everything from 2B to 462B parameters and claims 47% Model FLOP Utilization on H100s through tensor, pipeline, and data parallelism. You'd reach for this if you're actually training foundation models with multi-GPU setups, not fine-tuning or running inference. The Docker container route is cleaner than pip since it bundles the full NVIDIA PyTorch stack. Fair warning: this is infrastructure for serious compute budgets, not something you spin up for experiments. The skill passes security audits and comes from orchestra-research's AI toolkit, though the original repo is from ovachiever/droid-tings.
npx skills add https://github.com/orchestra-research/ai-research-skills --skill training-llms-megatron