When you're wrestling with AOTInductor segfaults or device mismatch errors, this skill walks you through the debugging process systematically. It covers the critical first check (compile device must match load device, input shapes and devices must align), then routes you through common failure modes like CUDA illegal memory access and Triton index issues. The environment variable reference table is legitimately useful. You get concrete flags like AOTI_RUNTIME_CHECK_INPUTS and PYTORCH_NO_CUDA_MEMORY_CACHING with clear explanations of when they take effect. This is the kind of reference you want open in a terminal while debugging a production AOTI crash at 2am, not general advice about compilation.
npx skills add https://github.com/pytorch/pytorch --skill aoti-debug