Meta's torchforge separates RL infrastructure from algorithms so you can experiment with GRPO, DAPO, or custom losses without rebuilding distributed plumbing. It wires together TorchTitan for training, vLLM for inference, and Monarch for actor coordination, letting you define reward functions and loss logic while it handles the weight syncing and GPU orchestration. You need at least 3 GPUs to run GRPO training, and the library is explicitly experimental with changing APIs. If you're prototyping agentic RL in PyTorch and want clean abstractions over the usual Ray-based stacks, this is worth trying. For production stability, Meta suggests looking at miles or verl instead.
npx skills add https://github.com/orchestra-research/ai-research-skills --skill torchforge-rl-training