SGLang is a serving framework that's worth knowing about if you're building agents or need structured outputs. The killer feature is RadixAttention, which automatically caches shared prefixes across requests. In practice, this means if you're running agents with the same system prompt, you get 5× faster inference than vLLM because it only computes the new tokens. It handles constrained JSON/regex generation natively and is genuinely faster at it (3× on JSON decoding). The codebase powers 300,000+ GPUs at xAI and LinkedIn, so it's battle-tested at scale. Use it when you have repeated context or need guaranteed output formats. Stick with vLLM if you just need simple text generation without the prefix caching overhead.
npx skills add https://github.com/orchestra-research/ai-research-skills --skill sglang