This builds a browser-based annotation tool for reviewing LLM traces with pass/fail labels and freeform notes. The guidance is opinionated about display: render everything in its native format (markdown as HTML, code with syntax highlighting), collapse repetitive stuff like shared system prompts, and surface buried metadata as badges or headers. You annotate at the trace level, not individual spans, and it includes keyboard shortcuts for speed. The testing section is smart, using Playwright to screenshot the interface and verify the full workflow actually works. Good for when you need human review of LLM outputs and the generic tools don't fit your data structure or domain.
npx -y skills add hamelsmu/evals-skills --skill build-review-interface --agent claude-codeInstalls into .claude/skills of the current project.
Select a file.
sickn33/antigravity-awesome-skills
kubesphere/kubesphere
supercent-io/skills-template