Handles PDF extraction through five rule-based backends (PyMuPDF, OpenDataLoader, RapidOCR, Docling, Marker) plus optional LLM fallback (Gemini, Claude, GPT-4o, Ollama). The router audits each page's output with confidence scoring, then re-extracts failures with a stronger backend. Exposes convert, chunk, batch_extract, and stream operations through stdio transport. You'd reach for this when RAG pipelines need clean markdown from mixed PDF batches (digital text, scans, tables) without manual per-document tuning. The CLI includes watch mode for folder monitoring, diff for comparing extractions, and estimate for cost prediction before runs. Three quality presets (economy, balanced, premium) with budget caps keep LLM costs predictable.
claude mcp add --transport stdio nameetp-pdfmux uvx pdfmux