This gives Claude access to DeepSpeed's DeepNVMe documentation for optimizing I/O operations between storage and tensors in deep learning workloads. You'll want this when moving large model checkpoints or training data between NVMe SSDs and GPU memory, especially if you're dealing with multi-gigabyte tensors. It covers both the standard libaio-backed handles and the faster GDS handles for direct GPU storage access. The docs walk through blocking and non-blocking writes, parallelization options, and the usual gotchas like data races during async operations. Honestly most useful if you're already hitting I/O bottlenecks in training pipelines, otherwise the standard PyTorch save and load will do fine.
npx skills add https://github.com/davila7/claude-code-templates --skill deepspeed