This handles the reward signal plumbing for RLHF and RLAIF workflows using the OpenJudge library. You get a clear decision tree upfront: pointwise rewards for verifiable tasks like code or math, pairwise tournaments for subjective stuff like instruction following, and pairwise comparisons when you need preference pairs for DPO. The tournament approach for GRPO is neat because it computes net win rate across all rollouts in a group instead of scoring each one independently. It also covers the boring but critical parts like normalizing scores from different graders and choosing between voting strategies when you're dealing with noisy LLM judges. If you're building custom reward models or trying to replace a trained reward model with LLM-as-judge, this gives you the patterns without having to figure out the combinatorics yourself.
npx -y skills add agentscope-ai/openjudge --skill rl-reward --agent claude-codeInstalls into .claude/skills of the current project.
Select a file.
juliusbrussee/caveman
mattpocock/skills
shadcn/improve
obra/superpowers
forrestchang/andrej-karpathy-skills
vercel-labs/skills