RWKV is a hybrid architecture that trains like a Transformer (parallel) but runs like an RNN (sequential), giving you O(n) complexity instead of O(n²). The big win is constant memory per token during inference, no KV cache bloat, which means you can actually handle 100K+ token contexts without exploding your VRAM. It's a Linux Foundation project already running in production at Microsoft (Windows, Office) and NVIDIA NeMo. If you're hitting memory walls with long context or building streaming apps, this is worth trying. The 14B models are available now, and RWKV-7 just dropped in March 2025. Performance is comparable to Transformers but inference gets dramatically cheaper as sequences grow.
npx skills add https://github.com/orchestra-research/ai-research-skills --skill rwkv-architecture