Turns your existing prompt-response logs into local regression tests. It generates eval suites from JSONL exports or traces you already have, then replays changed prompts and diffs the outputs to catch structural breaks before deployment. The MCP server exposes suite creation, case inspection, eval runs, and diff operations as tools. It flags missing JSON keys, dropped URLs, format changes, refusals, and empty responses without requiring an LLM judge or cloud service. Reach for this when you're iterating on prompts in coding assistants and want a safety check that spots behavioral regressions between versions. The core workflow is suite generation from baseline logs, then eval or diff commands on every prompt change.
claude mcp add --transport stdio gowtham0992-redline uvx redline