VelociRAG gives Claude four retrieval methods in one query: FAISS vector similarity, BM25 keyword matching, knowledge graph traversal, and metadata filters, all fused through reciprocal rank fusion with cross-encoder reranking. It runs entirely on ONNX Runtime without PyTorch or GPU dependencies, targeting sub-200ms searches once the daemon is warm. The MCP server exposes five tools: search, index, add_document, health, and list_sources. It automatically builds entity, temporal, and topic graph edges from markdown during indexing, with incremental updates that only reprocess changed files. Reach for this when you need multi-layer RAG without heavyweight ML frameworks, or when you're indexing documentation that benefits from graph connections alongside semantic search.
claude mcp add --transport stdio haseebkhalid1507-velocirag uvx velocirag