Built by the Sunholo team behind AILANG, this parser extracts structured content from Office documents and PDFs with unusual precision. The deterministic XML approach captures track changes, interleaved comments, headers, footers, and merged cells that most parsers miss. Office formats run locally with zero AI. PDFs and images delegate to whatever model you configure (Gemini, Claude, local Ollama). Outputs JSON and markdown, runs via stdio or HTTP. The team benchmarked it against Pandoc, Docling, and six others on 69 files across 11 formats and scored 93.9% composite. Reach for this when you need redlining metadata, speaker notes from PPTX, or multi-sheet XLSX data without fighting raw OOXML yourself.
claude mcp add --transport stdio io.github.sunholo-data-parse uvx parse