Four LLM-as-judge metrics from the PaperOrchestra paper (arXiv:2604.05018) for scoring research paper quality. You get Citation F1 with P0/P1 partitioning, a six-axis literature review scorer with anti-inflation caps, and two side-by-side comparators (full paper and lit review only). The literature review rubric is strict by design: default range is 45-70, anything above 85 needs strong evidence on all axes, and there are hard penalties for overclaiming or citation dumping. Useful if you're building paper-writing agents and want the same benchmarks the paper used to compare against AI-Scientist-v2, or if you just want a systematic way to score drafts without handwaving.
npx -y skills add ar9av/paperorchestra --skill paper-autoraters --agent claude-codeInstalls into .claude/skills of the current project.
Select a file.
github/awesome-copilot
alirezarezvani/claude-skills
microsoft/win-dev-skills
github/awesome-copilot