Velocirag

10STDIOregistry active

Summary

VelociRAG gives Claude four retrieval methods in one query: FAISS vector similarity, BM25 keyword matching, knowledge graph traversal, and metadata filters, all fused through reciprocal rank fusion with cross-encoder reranking. It runs entirely on ONNX Runtime without PyTorch or GPU dependencies, targeting sub-200ms searches once the daemon is warm. The MCP server exposes five tools: search, index, add_document, health, and list_sources. It automatically builds entity, temporal, and topic graph edges from markdown during indexing, with incremental updates that only reprocess changed files. Reach for this when you need multi-layer RAG without heavyweight ML frameworks, or when you're indexing documentation that benefits from graph connections alongside semantic search.

CodeRabbit

AI writes the code. CodeRabbit catches the slop.

Try For Free →

AppSignal

Monitor with ease. Code with confidence.

Start Free Trial →

AI notepad for back-to-back meetings

Notes, actions and memory. Without a meeting bot. First month 100% off.

Download for free →

Keep your Mac awake

Keep your Mac awake while Claude Code and 40+ AI agents run. Sleeps when they're idle.

One time payment $9 →

Email for Agents: Free tier available

Give your AI agent a complete email layer—sending, inbound inboxes, and sandbox testing.

Get 4K emails/month free →

Context.dev

Integrate web data into your AI product. One API to scrape website & brand data.

Get API Key Now →

CodeScene MCP Server

Your agent targets a perfect 10 Code Health score. Deterministic. Every commit.

Try For Free →

Make your agent a DeFi expert

Agent, run crypto. Access onchain data & trade routes via 1inch.

Install now →

CodeRabbit

AI writes the code. CodeRabbit catches the slop.

Try For Free →

AppSignal

Monitor with ease. Code with confidence.

Start Free Trial →

AI notepad for back-to-back meetings

Notes, actions and memory. Without a meeting bot. First month 100% off.

Download for free →

Keep your Mac awake

Keep your Mac awake while Claude Code and 40+ AI agents run. Sleeps when they're idle.

One time payment $9 →

Email for Agents: Free tier available

Give your AI agent a complete email layer—sending, inbound inboxes, and sandbox testing.

Get 4K emails/month free →

Context.dev

Integrate web data into your AI product. One API to scrape website & brand data.

Get API Key Now →

CodeScene MCP Server

Your agent targets a perfect 10 Code Health score. Deterministic. Every commit.

Try For Free →

Make your agent a DeFi expert

Agent, run crypto. Access onchain data & trade routes via 1inch.

Install now →

🦖 VelociRAG

Lightning-fast RAG for AI agents.

Four-layer retrieval fusion powered by ONNX Runtime. No PyTorch. Sub-200ms warm search. Incremental graph updates. MCP-ready.

Most RAG solutions either drag in 2GB+ of PyTorch or limit you to single-layer vector search. VelociRAG gives you four retrieval methods — vector similarity, BM25 keyword matching, knowledge graph traversal, and metadata filtering — fused through reciprocal rank fusion with cross-encoder reranking. All running on ONNX Runtime, no GPU, no API keys. Comes with an MCP server for agent integration, a Unix socket daemon for warm queries, and a CLI that just works.

🚀 Quick Start

MCP Server (Claude, Cursor, Windsurf)

pip install "velocirag[mcp]"
velocirag index ./my-docs
velocirag mcp

Claude Code — add to .mcp.json in your project root:

{
  "mcpServers": {
    "velocirag": {
      "command": "velocirag",
      "args": ["mcp"],
      "env": { "VELOCIRAG_DB": "/path/to/data" }
    }
  }
}

Then open /mcp in Claude Code and enable the velocirag server. If using a virtualenv, use the full path to the binary (e.g. .venv/bin/velocirag).

Claude Desktop — add to claude_desktop_config.json:

{
  "mcpServers": {
    "velocirag": {
      "command": "velocirag",
      "args": ["mcp", "--db", "/path/to/data"]
    }
  }
}

Cursor — add to .cursor/mcp.json:

{
  "mcpServers": {
    "velocirag": {
      "command": "velocirag",
      "args": ["mcp", "--db", "/path/to/data"]
    }
  }
}

Python API

from velocirag import Embedder, VectorStore, Searcher

embedder = Embedder()
store = VectorStore('./my-db', embedder)
store.add_directory('./my-docs')
searcher = Searcher(store, embedder)
results = searcher.search('query', limit=5)

CLI

pip install velocirag
velocirag index ./my-docs
velocirag search "your query here"

Search Daemon (warm engine for CLI users)

velocirag serve --db ./my-data        # start daemon (background)
velocirag search "query"              # auto-routes through daemon
velocirag status                      # check daemon health
velocirag stop                        # stop daemon

The daemon keeps the ONNX model + FAISS index warm over a Unix socket. First query loads the engine (~1s), subsequent queries return in ~180ms with full 4-layer fusion.

🎯 Why VelociRAG?

4-layer search — vector + BM25 keyword + knowledge graph + metadata, fused with RRF
No LLM needed — search runs entirely on local models (MiniLM + TinyBERT, ~80MB total)
No GPU needed — pure ONNX inference, runs on any machine
~3ms warm search — daemon keeps models + indices warm over Unix socket
Incremental indexing — add files without rebuilding the whole index
MCP server — plug into Claude, Cursor, Windsurf, any MCP client

Related Projects

Memkoshi — Agent memory system. Uses VelociRAG as its search engine.
Stelline — Session intelligence. Crafts memories from conversation logs.
Glyph — MCP security scanner and runtime protection.

🏗️ How It Works

The 4-layer pipeline:

Query → expand (acronyms, variants)
      → [Vector]   FAISS cosine similarity (384d, MiniLM-L6-v2 via ONNX)
      → [Keyword]  BM25 via SQLite FTS5
      → [Graph]    Knowledge graph traversal
      → [Metadata] Structured SQL filters (tags, status, project)
      → RRF Fusion → Cross-encoder rerank → Results

What each layer catches:

Query type	Vector	Keyword	Graph	Metadata
Conceptual ("improve error handling")	✅	—	—	—
Exact match ("ERR_CONNECTION_REFUSED")	—	✅	—	—
Connected concepts	—	—	✅	—
Filtered ("#python status:active")	—	—	—	✅
Combined ("React state management")	✅	✅	✅	✅

✨ Features

ONNX Runtime — 184ms cold start, 3ms cached. No PyTorch, no GPU
Four-layer fusion — FAISS vector similarity + SQLite FTS5 (BM25) + knowledge graph + metadata filtering, merged via reciprocal rank fusion
Cross-encoder reranking — TinyBERT reranker via ONNX Runtime — included in base install, no PyTorch needed. Downloads ~17MB model on first use
Incremental graph updates — file-centric provenance tracking detects what changed and only rebuilds affected nodes/edges. Cascading deletes maintain consistency across all stores (vector, graph, metadata). Multi-source support with isolated provenance per source
MCP server — Five tools (search, index, add_document, health, list_sources) for Claude, Cursor, Windsurf
Search daemon — Unix socket server keeps ONNX model + FAISS index warm between queries
Knowledge graph — Analyzers build entity, temporal, topic, and explicit-link edges from markdown. Optional GLiNER NER. 418 files in 2.1s
Smart chunking — Header-aware splitting preserves document structure and parent context
Query expansion — Acronym registry, casing/spacing variants, underscore-aware tokenization
Runs anywhere — CPU-only, 8GB RAM, no API keys, no external services

🤖 MCP Server

VelociRAG exposes a Model Context Protocol server for seamless agent integration:

Available tools:

search — 4-layer fusion search with reranking
index — Add documents to the knowledge base
add_document — Insert single document
health — System diagnostics
list_sources — Show indexed document sources

The MCP server process stays alive between queries, so models load once and every subsequent search is warm. Works with any MCP-compatible client.

🐍 Python API

Full 4-layer unified search:

from velocirag import (
    Embedder, VectorStore, Searcher,
    GraphStore, MetadataStore, UnifiedSearch,
    GraphPipeline
)

# Build the full stack
embedder = Embedder()
store = VectorStore('./search-db', embedder)
graph_store = GraphStore('./search-db/graph.db')
metadata_store = MetadataStore('./search-db/metadata.db')

# Index with graph + metadata
store.add_directory('./docs')
pipeline = GraphPipeline(graph_store, embedder, metadata_store)
pipeline.build('./docs', source_name='my-docs')

# Unified search across all layers
searcher = Searcher(store, embedder)
unified = UnifiedSearch(searcher, graph_store, metadata_store)
results = unified.search(
    'machine learning algorithms',
    limit=5,
    enrich_graph=True,
    filters={'tags': ['python'], 'status': 'active'}
)

Quick semantic search:

from velocirag import Embedder, VectorStore, Searcher

embedder = Embedder()
store = VectorStore('./db', embedder)
store.add_directory('./docs')
searcher = Searcher(store, embedder)
results = searcher.search('neural networks', limit=10)

Incremental graph updates:

from velocirag import Embedder, GraphStore, GraphPipeline

# First run — full build, populates provenance
gs = GraphStore('./db/graph.db')
pipeline = GraphPipeline(gs, embedder=Embedder())
pipeline.build('./docs', source_name='my-docs')  # full build

# Subsequent runs — only changed files get reprocessed
pipeline.build('./docs', source_name='my-docs')  # incremental (automatic)

# Force full rebuild
pipeline.build('./docs', source_name='my-docs', force_rebuild=True)

# Multi-source graphs
pipeline.build('./project-a', source_name='project-a')
pipeline.build('./project-b', source_name='project-b')  # isolated provenance

# Deleted files automatically cascade across all stores
# (vector, FTS5, graph, metadata) on next build

💻 CLI Reference

# Index documents (graph + metadata built by default)
velocirag index <path> [--no-graph] [--no-metadata] [--gliner] [--full-graph] [--force]
                       [--source NAME] [--db PATH]

# Search across all layers (auto-routes through daemon if running)
velocirag search <query> [--limit N] [--threshold F] [--format text|json]

# Search daemon
velocirag serve [--db PATH] [-f]         # start daemon (-f for foreground)
velocirag stop                            # stop daemon
velocirag status                          # check daemon health

# Metadata queries
velocirag query [--tags TAG] [--status S] [--project P] [--recent N]

# System health and status
velocirag health [--format text|json]

# Start MCP server
velocirag mcp [--db PATH] [--transport stdio|sse]

Options:

--no-graph — Skip knowledge graph build
--no-metadata — Skip metadata extraction
--full-graph — Build graph WITH semantic similarity edges (~2GB extra RAM)
--source NAME — Label for multi-source provenance isolation
--force — Clear and rebuild from scratch
--gliner — Use GLiNER for entity extraction (requires pip install "velocirag[ner]")

📊 Performance

Real benchmarks on ByteByteGo/system-design-101 (418 files, 1,001 chunks):

Metric	Value
Index (418 files)	13.6s
Search (warm, 5 results)	35–90ms
Graph build (light)	2.1s → 2,397 nodes, 8,717 edges
Incremental update (1 file)	1.3s
Reranker	Cross-encoder TinyBERT via ONNX
Install size	~80MB (no PyTorch)
RAM usage	<1GB with all models loaded

Production deployment (6,300+ chunks, 3 sources, 950 files):

Metric	Value
Full search (warm)	16ms avg, 2ms min
Full search (first run)	22ms avg, 4ms min
Search P50 / P95	17ms / 55ms
Hit rate (100-query benchmark)	99/100
Graph	3,125 nodes, 132,320 edges
Reranker	Cross-encoder TinyBERT via ONNX
RAM	<1GB with all models loaded

⚙️ Configuration

Environment Variable	Default	Description
`VELOCIRAG_DB`	`./.velocirag`	Database directory
`VELOCIRAG_SOCKET`	`/tmp/velocirag-daemon.sock`	Daemon socket path
`NO_COLOR`	—	Disable colored output

Dependencies (all included in base install):

onnxruntime — ONNX inference (embedder + reranker)
tokenizers + huggingface-hub — model loading
faiss-cpu — vector similarity search
networkx + scikit-learn — knowledge graph + topic clustering
numpy, click, pyyaml, python-frontmatter

Optional extras:

pip install "velocirag[mcp]" — MCP server (adds fastmcp)
pip install "velocirag[ner]" — GLiNER entity extraction (adds gliner, requires PyTorch)

📚 References

VelociRAG builds on these foundational works:

Core Fusion & Retrieval

Reciprocal Rank Fusion — Cormack, G. V., Clarke, C. L. A., & Büttcher, S. (2009). "Reciprocal Rank Fusion outperforms Condorcet and individual Rank Learning Methods." SIGIR '09.
Core fusion algorithm for merging results across retrieval layers.

BM25 — Robertson, S. E., Walker, S., Jones, S., Hancock-Beaulieu, M., & Gatford, M. (1994). "Okapi at TREC-3." TREC-3.
Keyword search foundation via SQLite FTS5.

Embeddings & Neural IR

Sentence-BERT — Reimers, N., & Gurevych, I. (2019). "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks." EMNLP 2019. paper
Dense embedding architecture using all-MiniLM-L6-v2.

MiniLM — Wang, W., Wei, F., Dong, L., Bao, H., Yang, N., & Zhou, M. (2020). "MiniLM: Deep Self-Attention Distillation for Task-Agnostic Compression of Pre-Trained Transformers." NeurIPS 2020. paper
Efficient transformer distillation for production embedding models.

Reranking & Neural Models

Cross-Encoder Reranking — Nogueira, R., & Cho, K. (2019). "Passage Re-ranking with BERT." arXiv:1901.04085. paper
Cross-attention reranking with TinyBERT on MS MARCO.

TinyBERT — Jiao, X., et al. (2020). "TinyBERT: Distilling BERT for Natural Language Understanding." Findings of EMNLP 2020. paper
Compressed BERT for fast reranking inference.

Vector Search & Systems

FAISS — Johnson, J., Douze, M., & Jégou, H. (2019). "Billion-scale similarity search with GPUs." IEEE Transactions on Big Data. paper
High-performance vector similarity search engine.

GLiNER — Zaratiana, U., Nzeyimana, A., & Holat, P. (2023). "GLiNER: Generalist Model for Named Entity Recognition using Bidirectional Transformer." arXiv:2311.08526. paper
Generalist NER for knowledge graph entity extraction (optional dependency).

📄 License

MIT — Use it anywhere, build anything.

Need agent integration help? Check AGENTS.md for machine-readable project context.

Built for agents who think fast and remember faster.

Featured

CodeRabbit

AI writes the code. CodeRabbit catches the slop.

Try For Free →

AppSignal

Monitor with ease. Code with confidence.

Start Free Trial →

AI notepad for back-to-back meetings

Notes, actions and memory. Without a meeting bot. First month 100% off.

Download for free →

Keep your Mac awake

Keep your Mac awake while Claude Code and 40+ AI agents run. Sleeps when they're idle.

One time payment $9 →

Email for Agents: Free tier available

Give your AI agent a complete email layer—sending, inbound inboxes, and sandbox testing.

Get 4K emails/month free →

Context.dev

Integrate web data into your AI product. One API to scrape website & brand data.

Get API Key Now →

CodeScene MCP Server

Your agent targets a perfect 10 Code Health score. Deterministic. Every commit.

Try For Free →

Make your agent a DeFi expert

Agent, run crypto. Access onchain data & trade routes via 1inch.

Install now →

🦖 VelociRAG

Lightning-fast RAG for AI agents.

Four-layer retrieval fusion powered by ONNX Runtime. No PyTorch. Sub-200ms warm search. Incremental graph updates. MCP-ready.

🚀 Quick Start

MCP Server (Claude, Cursor, Windsurf)

pip install "velocirag[mcp]"
velocirag index ./my-docs
velocirag mcp

Claude Code — add to .mcp.json in your project root:

{
  "mcpServers": {
    "velocirag": {
      "command": "velocirag",
      "args": ["mcp"],
      "env": { "VELOCIRAG_DB": "/path/to/data" }
    }
  }
}

Then open /mcp in Claude Code and enable the velocirag server. If using a virtualenv, use the full path to the binary (e.g. .venv/bin/velocirag).

Claude Desktop — add to claude_desktop_config.json:

{
  "mcpServers": {
    "velocirag": {
      "command": "velocirag",
      "args": ["mcp", "--db", "/path/to/data"]
    }
  }
}

Cursor — add to .cursor/mcp.json:

{
  "mcpServers": {
    "velocirag": {
      "command": "velocirag",
      "args": ["mcp", "--db", "/path/to/data"]
    }
  }
}

Python API

from velocirag import Embedder, VectorStore, Searcher

embedder = Embedder()
store = VectorStore('./my-db', embedder)
store.add_directory('./my-docs')
searcher = Searcher(store, embedder)
results = searcher.search('query', limit=5)

CLI

pip install velocirag
velocirag index ./my-docs
velocirag search "your query here"

Search Daemon (warm engine for CLI users)

velocirag serve --db ./my-data        # start daemon (background)
velocirag search "query"              # auto-routes through daemon
velocirag status                      # check daemon health
velocirag stop                        # stop daemon

The daemon keeps the ONNX model + FAISS index warm over a Unix socket. First query loads the engine (~1s), subsequent queries return in ~180ms with full 4-layer fusion.

🎯 Why VelociRAG?

4-layer search — vector + BM25 keyword + knowledge graph + metadata, fused with RRF
No LLM needed — search runs entirely on local models (MiniLM + TinyBERT, ~80MB total)
No GPU needed — pure ONNX inference, runs on any machine
~3ms warm search — daemon keeps models + indices warm over Unix socket
Incremental indexing — add files without rebuilding the whole index
MCP server — plug into Claude, Cursor, Windsurf, any MCP client

Related Projects

Memkoshi — Agent memory system. Uses VelociRAG as its search engine.
Stelline — Session intelligence. Crafts memories from conversation logs.
Glyph — MCP security scanner and runtime protection.

🏗️ How It Works

The 4-layer pipeline:

Query → expand (acronyms, variants)
      → [Vector]   FAISS cosine similarity (384d, MiniLM-L6-v2 via ONNX)
      → [Keyword]  BM25 via SQLite FTS5
      → [Graph]    Knowledge graph traversal
      → [Metadata] Structured SQL filters (tags, status, project)
      → RRF Fusion → Cross-encoder rerank → Results

What each layer catches:

Query type	Vector	Keyword	Graph	Metadata
Conceptual ("improve error handling")	✅	—	—	—
Exact match ("ERR_CONNECTION_REFUSED")	—	✅	—	—
Connected concepts	—	—	✅	—
Filtered ("#python status:active")	—	—	—	✅
Combined ("React state management")	✅	✅	✅	✅

✨ Features

ONNX Runtime — 184ms cold start, 3ms cached. No PyTorch, no GPU
Four-layer fusion — FAISS vector similarity + SQLite FTS5 (BM25) + knowledge graph + metadata filtering, merged via reciprocal rank fusion
Cross-encoder reranking — TinyBERT reranker via ONNX Runtime — included in base install, no PyTorch needed. Downloads ~17MB model on first use
Incremental graph updates — file-centric provenance tracking detects what changed and only rebuilds affected nodes/edges. Cascading deletes maintain consistency across all stores (vector, graph, metadata). Multi-source support with isolated provenance per source
MCP server — Five tools (search, index, add_document, health, list_sources) for Claude, Cursor, Windsurf
Search daemon — Unix socket server keeps ONNX model + FAISS index warm between queries
Knowledge graph — Analyzers build entity, temporal, topic, and explicit-link edges from markdown. Optional GLiNER NER. 418 files in 2.1s
Smart chunking — Header-aware splitting preserves document structure and parent context
Query expansion — Acronym registry, casing/spacing variants, underscore-aware tokenization
Runs anywhere — CPU-only, 8GB RAM, no API keys, no external services

🤖 MCP Server

VelociRAG exposes a Model Context Protocol server for seamless agent integration:

Available tools:

search — 4-layer fusion search with reranking
index — Add documents to the knowledge base
add_document — Insert single document
health — System diagnostics
list_sources — Show indexed document sources

The MCP server process stays alive between queries, so models load once and every subsequent search is warm. Works with any MCP-compatible client.

🐍 Python API

Full 4-layer unified search:

from velocirag import (
    Embedder, VectorStore, Searcher,
    GraphStore, MetadataStore, UnifiedSearch,
    GraphPipeline
)

# Build the full stack
embedder = Embedder()
store = VectorStore('./search-db', embedder)
graph_store = GraphStore('./search-db/graph.db')
metadata_store = MetadataStore('./search-db/metadata.db')

# Index with graph + metadata
store.add_directory('./docs')
pipeline = GraphPipeline(graph_store, embedder, metadata_store)
pipeline.build('./docs', source_name='my-docs')

# Unified search across all layers
searcher = Searcher(store, embedder)
unified = UnifiedSearch(searcher, graph_store, metadata_store)
results = unified.search(
    'machine learning algorithms',
    limit=5,
    enrich_graph=True,
    filters={'tags': ['python'], 'status': 'active'}
)

Quick semantic search:

from velocirag import Embedder, VectorStore, Searcher

embedder = Embedder()
store = VectorStore('./db', embedder)
store.add_directory('./docs')
searcher = Searcher(store, embedder)
results = searcher.search('neural networks', limit=10)

Incremental graph updates:

from velocirag import Embedder, GraphStore, GraphPipeline

# First run — full build, populates provenance
gs = GraphStore('./db/graph.db')
pipeline = GraphPipeline(gs, embedder=Embedder())
pipeline.build('./docs', source_name='my-docs')  # full build

# Subsequent runs — only changed files get reprocessed
pipeline.build('./docs', source_name='my-docs')  # incremental (automatic)

# Force full rebuild
pipeline.build('./docs', source_name='my-docs', force_rebuild=True)

# Multi-source graphs
pipeline.build('./project-a', source_name='project-a')
pipeline.build('./project-b', source_name='project-b')  # isolated provenance

# Deleted files automatically cascade across all stores
# (vector, FTS5, graph, metadata) on next build

💻 CLI Reference

# Index documents (graph + metadata built by default)
velocirag index <path> [--no-graph] [--no-metadata] [--gliner] [--full-graph] [--force]
                       [--source NAME] [--db PATH]

# Search across all layers (auto-routes through daemon if running)
velocirag search <query> [--limit N] [--threshold F] [--format text|json]

# Search daemon
velocirag serve [--db PATH] [-f]         # start daemon (-f for foreground)
velocirag stop                            # stop daemon
velocirag status                          # check daemon health

# Metadata queries
velocirag query [--tags TAG] [--status S] [--project P] [--recent N]

# System health and status
velocirag health [--format text|json]

# Start MCP server
velocirag mcp [--db PATH] [--transport stdio|sse]

Options:

--no-graph — Skip knowledge graph build
--no-metadata — Skip metadata extraction
--full-graph — Build graph WITH semantic similarity edges (~2GB extra RAM)
--source NAME — Label for multi-source provenance isolation
--force — Clear and rebuild from scratch
--gliner — Use GLiNER for entity extraction (requires pip install "velocirag[ner]")

📊 Performance

Real benchmarks on ByteByteGo/system-design-101 (418 files, 1,001 chunks):

Metric	Value
Index (418 files)	13.6s
Search (warm, 5 results)	35–90ms
Graph build (light)	2.1s → 2,397 nodes, 8,717 edges
Incremental update (1 file)	1.3s
Reranker	Cross-encoder TinyBERT via ONNX
Install size	~80MB (no PyTorch)
RAM usage	<1GB with all models loaded

Production deployment (6,300+ chunks, 3 sources, 950 files):

Metric	Value
Full search (warm)	16ms avg, 2ms min
Full search (first run)	22ms avg, 4ms min
Search P50 / P95	17ms / 55ms
Hit rate (100-query benchmark)	99/100
Graph	3,125 nodes, 132,320 edges
Reranker	Cross-encoder TinyBERT via ONNX
RAM	<1GB with all models loaded

⚙️ Configuration

Environment Variable	Default	Description
`VELOCIRAG_DB`	`./.velocirag`	Database directory
`VELOCIRAG_SOCKET`	`/tmp/velocirag-daemon.sock`	Daemon socket path
`NO_COLOR`	—	Disable colored output

Dependencies (all included in base install):

onnxruntime — ONNX inference (embedder + reranker)
tokenizers + huggingface-hub — model loading
faiss-cpu — vector similarity search
networkx + scikit-learn — knowledge graph + topic clustering
numpy, click, pyyaml, python-frontmatter

Optional extras:

pip install "velocirag[mcp]" — MCP server (adds fastmcp)
pip install "velocirag[ner]" — GLiNER entity extraction (adds gliner, requires PyTorch)

📚 References

VelociRAG builds on these foundational works:

Core Fusion & Retrieval

Reciprocal Rank Fusion — Cormack, G. V., Clarke, C. L. A., & Büttcher, S. (2009). "Reciprocal Rank Fusion outperforms Condorcet and individual Rank Learning Methods." SIGIR '09.
Core fusion algorithm for merging results across retrieval layers.

BM25 — Robertson, S. E., Walker, S., Jones, S., Hancock-Beaulieu, M., & Gatford, M. (1994). "Okapi at TREC-3." TREC-3.
Keyword search foundation via SQLite FTS5.

Embeddings & Neural IR

Sentence-BERT — Reimers, N., & Gurevych, I. (2019). "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks." EMNLP 2019. paper
Dense embedding architecture using all-MiniLM-L6-v2.

MiniLM — Wang, W., Wei, F., Dong, L., Bao, H., Yang, N., & Zhou, M. (2020). "MiniLM: Deep Self-Attention Distillation for Task-Agnostic Compression of Pre-Trained Transformers." NeurIPS 2020. paper
Efficient transformer distillation for production embedding models.

Reranking & Neural Models

Cross-Encoder Reranking — Nogueira, R., & Cho, K. (2019). "Passage Re-ranking with BERT." arXiv:1901.04085. paper
Cross-attention reranking with TinyBERT on MS MARCO.

TinyBERT — Jiao, X., et al. (2020). "TinyBERT: Distilling BERT for Natural Language Understanding." Findings of EMNLP 2020. paper
Compressed BERT for fast reranking inference.

Vector Search & Systems

FAISS — Johnson, J., Douze, M., & Jégou, H. (2019). "Billion-scale similarity search with GPUs." IEEE Transactions on Big Data. paper
High-performance vector similarity search engine.

GLiNER — Zaratiana, U., Nzeyimana, A., & Holat, P. (2023). "GLiNER: Generalist Model for Named Entity Recognition using Bidirectional Transformer." arXiv:2311.08526. paper
Generalist NER for knowledge graph entity extraction (optional dependency).

📄 License

MIT — Use it anywhere, build anything.

Need agent integration help? Check AGENTS.md for machine-readable project context.

Built for agents who think fast and remember faster.

Velocirag

🦖 VelociRAG

🚀 Quick Start

MCP Server (Claude, Cursor, Windsurf)

Python API

CLI

Search Daemon (warm engine for CLI users)

🎯 Why VelociRAG?

Related Projects

🏗️ How It Works

✨ Features

🤖 MCP Server

🐍 Python API

💻 CLI Reference

📊 Performance

⚙️ Configuration

📚 References

📄 License

Velocirag

🦖 VelociRAG

🚀 Quick Start

MCP Server (Claude, Cursor, Windsurf)

Python API

CLI

Search Daemon (warm engine for CLI users)

🎯 Why VelociRAG?

Related Projects

🏗️ How It Works

✨ Features

🤖 MCP Server

🐍 Python API

💻 CLI Reference

📊 Performance

⚙️ Configuration

📚 References

📄 License

Related AI & LLM Tools MCP Servers

Related AI & LLM Tools MCP Servers