This runs three inline checks to catch when AI agents hallucinate facts, silently swallow exceptions with mock data, or claim they finished tasks without proof. You get verify_response_grounding to score claims against input context using stem-Jaccard overlap, find_swallowed_exceptions to walk AST for try/catch blocks that return fake success responses, and review_transcript to flag unverified completion claims and cross-turn contradictions. It's pure Python, no API key, sub-second. Designed to be called during the conversation in Claude Code or Cursor, not as a separate eval pipeline. The use case is catching the quiet failure modes that look fine in standard dashboards but break production silently.
claude mcp add --transport stdio temurkhan13-openclaw-output-vetter-mcp uvx openclaw-output-vetter-mcp