Rl Reward

657 starsApache-2.0

Summary

This handles the reward signal plumbing for RLHF and RLAIF workflows using the OpenJudge library. You get a clear decision tree upfront: pointwise rewards for verifiable tasks like code or math, pairwise tournaments for subjective stuff like instruction following, and pairwise comparisons when you need preference pairs for DPO. The tournament approach for GRPO is neat because it computes net win rate across all rollouts in a group instead of scoring each one independently. It also covers the boring but critical parts like normalizing scores from different graders and choosing between voting strategies when you're dealing with noisy LLM judges. If you're building custom reward models or trying to replace a trained reward model with LLM-as-judge, this gives you the patterns without having to figure out the combinatorics yourself.

Install to Claude Code

npx -y skills add agentscope-ai/openjudge --skill rl-reward --agent claude-code

Installs into .claude/skills of the current project.

CodeRabbit

AI writes the code. CodeRabbit catches the slop.

Try For Free →

AppSignal

Monitor with ease. Code with confidence.

Start Free Trial →

AI notepad for back-to-back meetings

Notes, actions and memory. Without a meeting bot. First month 100% off.

Download for free →

Advertise on claudemarketplaces.com

Show your product to 350K+ AI developers monthly. (Empty days caused by temporary data issue)

Try for a month →

Give your AI the whole web as clean markdown

Integrate web data into your AI product. One API to scrape website & brand data.

Get API Key Now →

belt - the only tool your agent needs

belt cli automatically finds the best tools and skills for your agent. image, video, music, tts...

one prompt install →

Email for Agents: Free tier available

Give your AI agent a complete email layer—sending, inbound inboxes, and sandbox testing.

Get 4K emails/month free →

Make your agent a DeFi expert

Agent, run crypto. Access onchain data & trade routes via 1inch.

Install now →

CodeRabbit

AI writes the code. CodeRabbit catches the slop.

Try For Free →

AppSignal

Monitor with ease. Code with confidence.

Start Free Trial →

AI notepad for back-to-back meetings

Notes, actions and memory. Without a meeting bot. First month 100% off.

Download for free →

Advertise on claudemarketplaces.com

Show your product to 350K+ AI developers monthly. (Empty days caused by temporary data issue)

Try for a month →

Give your AI the whole web as clean markdown

Integrate web data into your AI product. One API to scrape website & brand data.

Get API Key Now →

belt - the only tool your agent needs

belt cli automatically finds the best tools and skills for your agent. image, video, music, tts...

one prompt install →

Email for Agents: Free tier available

Give your AI agent a complete email layer—sending, inbound inboxes, and sandbox testing.

Get 4K emails/month free →

Make your agent a DeFi expert

Agent, run crypto. Access onchain data & trade routes via 1inch.

Install now →

Files

SKILL.md

Select a file.

Featured

CodeRabbit

AI writes the code. CodeRabbit catches the slop.

Try For Free →

AppSignal

Monitor with ease. Code with confidence.

Start Free Trial →

AI notepad for back-to-back meetings

Notes, actions and memory. Without a meeting bot. First month 100% off.

Download for free →

Advertise on claudemarketplaces.com

Show your product to 350K+ AI developers monthly. (Empty days caused by temporary data issue)

Try for a month →

Give your AI the whole web as clean markdown

Integrate web data into your AI product. One API to scrape website & brand data.

Get API Key Now →

belt - the only tool your agent needs

belt cli automatically finds the best tools and skills for your agent. image, video, music, tts...

one prompt install →

Email for Agents: Free tier available

Give your AI agent a complete email layer—sending, inbound inboxes, and sandbox testing.

Get 4K emails/month free →

Make your agent a DeFi expert

Agent, run crypto. Access onchain data & trade routes via 1inch.

Install now →

First SeenJun 11, 2026

View on GitHub

Rl Reward

Install to Claude Code

Rl Reward

Install to Claude Code

Recommended

Recommended