This is THUDM's production RL framework that powers the GLM-4 series, built for teams who need Megatron-LM's native training with SGLang's fast rollout generation. If you're scaling post-training on GLM, Qwen3, or DeepSeek models and want tight control over the data buffer and generation workflow, this is your option. It has both sync and async modes, supports multi-turn agentic training with custom generate functions, and comes with Docker images ready to go. The tradeoff is research-grade stability compared to enterprise alternatives like miles, but it's battle-tested at production scale by Z.ai and handles the full parallelism stack.
npx skills add https://github.com/orchestra-research/ai-research-skills --skill slime-rl-training