Ref Hallucination Arena

657 starsApache-2.0

Summary

This benchmark verifies whether LLMs are citing real academic papers or making them up. It hits Crossref, PubMed, arXiv, and DBLP to check every reference your model returns, then scores hallucination rate, per-field accuracy (title, author, year, DOI), and discipline breakdown. You can run it with or without tool augmentation (ReAct plus web search). The pipeline saves checkpoints, generates markdown reports with charts, and supports year constraints in queries. Use it when you need hard numbers on citation reliability instead of vibes. Honestly, the fact that this needs to exist tells you something about current LLM behavior with references, but at least now you can measure the damage.

Install to Claude Code

npx -y skills add agentscope-ai/openjudge --skill ref-hallucination-arena --agent claude-code

Installs into .claude/skills of the current project.

CodeRabbit

AI writes the code. CodeRabbit catches the slop.

Try For Free →

AppSignal

Monitor with ease. Code with confidence.

Start Free Trial →

AI notepad for back-to-back meetings

Notes, actions and memory. Without a meeting bot. First month 100% off.

Download for free →

Advertise on claudemarketplaces.com

Show your product to 350K+ AI developers monthly. (Empty days caused by temporary data issue)

Try for a month →

Give your AI the whole web as clean markdown

Integrate web data into your AI product. One API to scrape website & brand data.

Get API Key Now →

belt - the only tool your agent needs

belt cli automatically finds the best tools and skills for your agent. image, video, music, tts...

one prompt install →

Email for Agents: Free tier available

Give your AI agent a complete email layer—sending, inbound inboxes, and sandbox testing.

Get 4K emails/month free →

Make your agent a DeFi expert

Agent, run crypto. Access onchain data & trade routes via 1inch.

Install now →

CodeRabbit

AI writes the code. CodeRabbit catches the slop.

Try For Free →

AppSignal

Monitor with ease. Code with confidence.

Start Free Trial →

AI notepad for back-to-back meetings

Notes, actions and memory. Without a meeting bot. First month 100% off.

Download for free →

Advertise on claudemarketplaces.com

Show your product to 350K+ AI developers monthly. (Empty days caused by temporary data issue)

Try for a month →

Give your AI the whole web as clean markdown

Integrate web data into your AI product. One API to scrape website & brand data.

Get API Key Now →

belt - the only tool your agent needs

belt cli automatically finds the best tools and skills for your agent. image, video, music, tts...

one prompt install →

Email for Agents: Free tier available

Give your AI agent a complete email layer—sending, inbound inboxes, and sandbox testing.

Get 4K emails/month free →

Make your agent a DeFi expert

Agent, run crypto. Access onchain data & trade routes via 1inch.

Install now →

Files

SKILL.md

Select a file.

Featured

CodeRabbit

AI writes the code. CodeRabbit catches the slop.

Try For Free →

AppSignal

Monitor with ease. Code with confidence.

Start Free Trial →

AI notepad for back-to-back meetings

Notes, actions and memory. Without a meeting bot. First month 100% off.

Download for free →

Advertise on claudemarketplaces.com

Show your product to 350K+ AI developers monthly. (Empty days caused by temporary data issue)

Try for a month →

Give your AI the whole web as clean markdown

Integrate web data into your AI product. One API to scrape website & brand data.

Get API Key Now →

belt - the only tool your agent needs

belt cli automatically finds the best tools and skills for your agent. image, video, music, tts...

one prompt install →

Email for Agents: Free tier available

Give your AI agent a complete email layer—sending, inbound inboxes, and sandbox testing.

Get 4K emails/month free →

Make your agent a DeFi expert

Agent, run crypto. Access onchain data & trade routes via 1inch.

Install now →

First SeenJun 11, 2026

View on GitHub

Ref Hallucination Arena

Install to Claude Code

Ref Hallucination Arena

Install to Claude Code

Recommended

Recommended