This brings SAELens into Claude Code for training sparse autoencoders on neural network activations. If you're doing mechanistic interpretability work, you know the problem: neurons are polysemantic and fire for seemingly unrelated concepts because models pack features into superposition. SAEs decompose those dense activations into sparse, interpretable features that actually make sense. The library is based on Anthropic's monosemanticity research and has solid traction (1,100+ stars). Honestly, this is pretty niche. You'd reach for it if you're actively researching how models represent concepts internally, not for typical ML work. The skill passes most security audits, though Snyk shows a warning worth checking.
npx skills add https://github.com/orchestra-research/ai-research-skills --skill sparse-autoencoder-training