A production-grade RAG implementation that handles the messy parts: multi-format ingestion (DOCX, PDF with OCR, audio via Whisper), Korean-optimized chunking, and hybrid BM25 plus vector search with RRF. The indexing pipeline uses AI-powered chunk contextualization to generate bilingual keywords and preserve document context across chunking boundaries. A two-commit model saves expensive OCR and transcription work before hitting embedding APIs, so a rate limit or crash doesn't force you to re-process a 600-page scan. Supports Ollama, OpenAI, Gemini, and Anthropic for both embedding and contextualization. Exposes search tools over MCP stdio and runs headless indexing via CLI. Handles multiple knowledge bases in a single server process with SQLite WAL for concurrent read access during writes.