This is your go-to reference when you need to push transformer models beyond their trained context limits. It covers the four main techniques actually being used in production: RoPE for rotary position embeddings, YaRN for efficient extension up to 128k tokens, ALiBi for training-free extrapolation, and position interpolation for quickly adapting models like LLaMA from 2k to 32k contexts. The implementations are practical and the comparison table makes it easy to pick the right approach. If you're fine-tuning an existing model for longer documents, position interpolation or YaRN will save you compute. If you're training from scratch and want unlimited context, ALiBi is surprisingly simple and effective.
npx skills add https://github.com/orchestra-research/ai-research-skills --skill long-context