This is for writing, debugging, and optimizing CUDA kernels in Claude Code. It pushes a specific workflow: profile first with nsys or ncu before touching code, use printf in device code liberally when things break, and make small isolated changes instead of shotgun fixes. The compute-sanitizer patterns are solid for catching memory errors non-interactively, and the performance tables cut through the usual GPU optimization folklore. What's refreshing is the honesty about tooling limits: the skill explicitly says cuda-gdb often fails and sometimes you just need to stare at the diff between working and broken code. If you're doing GPU work and tired of guessing at bottlenecks, the measurement-driven approach here will save you time.
npx skills add https://github.com/technillogue/ptx-isa-markdown --skill cuda