This is a reference implementation for enterprise RAG that treats eval scores and observability as first-class CI artifacts. It exposes three MCP primitives: a query tool that hits pgvector with optional reranking, document resources keyed by source_id, and a cite_from_chunks prompt. The real value is the merge gate: every PR runs a 110-question golden set in CI and blocks if top-k recall regresses more than 5pp. Langfuse traces are public and shareable. The corpus is 238 VA education manuals already chunked and embedded in a fixture dump, so you can clone and immediately test prompt changes against known-good baselines. Reach for this when you need a working example of how to stop RAG quality from silently degrading between deploys.