If you're hitting transformer bottlenecks on long sequences, this gives you the Mamba SSM architecture with O(n) linear complexity instead of O(n²). It's genuinely 5× faster at inference with no KV cache, so you can handle million-token contexts without exploding memory. The skill covers both Mamba-1 (simpler, d_state=16) and Mamba-2 (multi-head, d_state=128), with pretrained models from 130M to 2.8B parameters ready to load from HuggingFace. The tradeoff is you need Linux and NVIDIA GPUs, and transformers still win on raw performance when you have the compute budget. Best for streaming applications, long context tasks, or anywhere linear scaling actually matters more than squeezing out another few points of accuracy.
npx skills add https://github.com/orchestra-research/ai-research-skills --skill mamba-architecture