Teaches Claude to read compiler vectorization reports, write SSE2/AVX2/NEON intrinsics, and debug why loops aren't auto-vectorizing. Covers the full workflow from checking GCC's fopt-info-vec output to manually writing intrinsics when the compiler can't optimize your code. Includes runtime CPU feature detection with __builtin_cpu_supports, common auto-vectorization blockers like pointer aliasing, and concrete intrinsics examples for x86 and ARM. The guidance on when to use intrinsics versus letting the compiler handle it is honest: try auto-vectorization first, only drop to intrinsics for hand-tuned shuffles or when the compiler fails. Useful if you're profiling hot loops and need to squeeze out performance without rewriting everything in assembly.
npx skills add https://github.com/mohitmishra786/low-level-dev-skills --skill simd-intrinsics