This is a full async RL framework for training personalized AI agents from conversation feedback without blocking inference. It runs four independent loops: serving, rollout collection, judge evaluation, and policy training via GRPO or on-policy distillation. You get plugin APIs for custom loss functions and reward models, plus ready-to-run scripts for terminal, GUI, SWE, and tool-call agents. The combined method (binary RL plus OPD) is the recommended approach. Deployment works locally or on Tinker cloud via Ray. If you're trying to improve an agent through actual usage rather than static datasets, this gives you the scaffolding to do continuous learning in the background.
npx skills add https://github.com/aradotso/trending-skills --skill openclaw-rl-training