Teaches Claude to diagnose and fix cache performance issues in C/C++ and Rust using perf counters, data layout transformations, and false sharing detection. You'd reach for this when your program shows high cache miss rates or when multithreaded code inexplicably runs slower than single-threaded. It walks through AoS to SoA conversions, cache line alignment, prefetching strategies, and loop blocking for matrix operations. The perf stat examples are immediately useful, showing exactly which counters matter and what thresholds indicate trouble. Honest take: cache optimization is one of those areas where measurement comes first, and this skill front-loads the profiling commands before suggesting fixes, which is the right order.
npx skills add https://github.com/mohitmishra786/low-level-dev-skills --skill cpu-cache-opt