This walks you through measuring what's actually happening inside your CPU: cache miss rates, branch mispredictions, IPC, memory bandwidth. You'll use perf stat to collect PMU events, PAPI for portable counter access across architectures, and perf annotate to see which source lines are thrashing your cache. It includes raw event codes for Intel and AMD when the generic aliases aren't enough, plus thresholds for what "good" looks like (under 1% branch misses, IPC above 2.0 on modern x86). Honestly most useful when you've already profiled the hot path and need to understand why it's slow at the microarchitecture level, not for general optimization.
npx skills add https://github.com/mohitmishra786/low-level-dev-skills --skill hardware-counters