CCM
/MCP
SkillsMCPMarketplacesDigestLearnAdvertise

This week in Claude

Every Monday: Claude Code, Agent SDK, MCP, and the Anthropic platform moves worth your time.

Skills by Category
Frontend DevelopmentBackend & APIsTesting & QASecurityDevOps & CI/CDGit & Pull RequestsDocumentationCode Review & QualityAI & Agent BuildingSkill Development
MCP Servers by Category
Sales & MarketingWeb & Browser AutomationDatabasesAI & LLM ToolsCloud & InfrastructureCommunication & MessagingDeveloper ToolsDesign & CreativeDocuments & KnowledgeSearch & Web Crawling
Marketplaces by Category
AI Agents & OrchestrationLLM IntegrationDevelopment ToolsFrontend & UIBackend & APIsDatabasesTesting & Code QualityDevOps & CloudSecurity & ComplianceGit & Version Control

Claude Code Marketplaces

Discover Claude Code plugins, extensions, and tools. Automatically updated directory of Anthropic Claude AI marketplaces with development tools, productivity plugins, and integrations.

Resources

  • Browse Skills
  • Browse MCP Servers
  • Browse Marketplaces
  • Plugins Reference

Community

  • About
  • Learn
  • Feedback
  • Privacy Policy
  • Advertise

Built for the Claude Code community with Claude Code by @mertduzgun

Independent project, not affiliated with Anthropic

PDF Reader

sylphlab/pdf-reader-mcp
711
Summary

The Pdf Reader Mcp server provides production-ready PDF processing capabilities for AI agents, enabling extraction of text, images, and metadata from PDF files. It delivers 5-10x faster parallel page processing compared to sequential approaches, implements Y-coordinate-based content ordering to preserve document layout, and offers flexible path handling for both absolute and relative file references across Windows and Unix systems. The server solves the performance and reliability challenges of traditional PDF processing through automatic parallelization, per-page error resilience, and comprehensive type safety with 94%+ test coverage.

CodeRabbit
CodeRabbit
AI writes the code. CodeRabbit catches the slop.
Try For Free →
Make your agent a DeFi expert
Make your agent a DeFi expert
Agent, run crypto. Access onchain data & trade routes via 1inch.
Install now →
Make money from your Skills
Make money from your Skills
On Capafy, your Skill runs online 24/7 as an agent product, and you get paid every time someone uses it.
Start earning →
AppSignal
AppSignal
Monitor with ease. Code with confidence.
Start Free Trial →
Vibe Prospecting MCPVibe Prospecting MCP
Vibe Prospecting MCP
Connect Claude to +800M contacts, +150M companies. Find & Enrich leads in chat.
Try For Free →
Context.devContext.dev
Context.dev
Integrate web data into your AI product. One API to scrape website & brand data.
Get API Key Now →
CodeRabbit
CodeRabbit
AI writes the code. CodeRabbit catches the slop.
Try For Free →
Make your agent a DeFi expert
Make your agent a DeFi expert
Agent, run crypto. Access onchain data & trade routes via 1inch.
Install now →
Make money from your Skills
Make money from your Skills
On Capafy, your Skill runs online 24/7 as an agent product, and you get paid every time someone uses it.
Start earning →
AppSignal
AppSignal
Monitor with ease. Code with confidence.
Start Free Trial →
Vibe Prospecting MCPVibe Prospecting MCP
Vibe Prospecting MCP
Connect Claude to +800M contacts, +150M companies. Find & Enrich leads in chat.
Try For Free →
Context.devContext.dev
Context.dev
Integrate web data into your AI product. One API to scrape website & brand data.
Get API Key Now →

📄 @sylphx/pdf-reader-mcp

Production-ready PDF processing server for AI agents

npm version License CI/CD codecov TypeScript Downloads

PDF inspection • PDF search • Agent document map • Accessibility report • Visual evidence • Region crops • Configured OCR

Security Validated

🚀 Overview

PDF Reader MCP is a production-ready Model Context Protocol server that empowers AI agents with structured, local-first PDF processing capabilities. Inspect PDFs before extraction, search text evidence with page and bbox provenance, render page-level visual evidence, crop bbox-grounded page regions, run configured OCR for scanned-page text layers, then extract a full agent document map, accessibility report, text, Markdown, semantic citation chunks, images, tables, annotations, outlines, structure trees, form fields, attachment metadata, and agent-ready document elements with strong performance and reliability.

The Problem:

// Traditional PDF processing
- Sequential page processing (slow)
- No natural content ordering
- Complex path handling
- Poor error isolation

The Solution:

// PDF Reader MCP
- Preflight PDF inspection for agent extraction planning 🔎
- MCP-native PDF search with snippets and bbox evidence 🔎
- Bounded page rendering for visual evidence and OCR routing 🖼️
- Bbox-grounded region crops for source evidence 🔍
- Configured local OCR provider for scanned-page text layers 🔡
- 5-10x faster parallel processing ⚡
- Full agent document map linking pages, elements, chunks, layout, safety, and geometry 🧭
- Semantic document AST for page/section/paragraph/list/table/image traversal 🌳
- PDF trust report for content safety, layout, table, and link-risk routing 🛡️
- Accessibility report for tagged-PDF coverage, headings, images, forms, links, and permissions ♿
- Structured element output for agent workflows 🧩
- Table quality diagnostics with inferred cell spans and continuation candidates 📊
- Markdown rendering for RAG and summarization 📝
- Citation-ready semantic/table/page chunks 🔗
- Layout diagnostics with reading-order confidence 📐
- Outlines, annotations, structure trees, forms, attachments, labels, and permission signals 🗂️
- Column-aware reading order 📐
- Flexible path support (absolute/relative) 🎯
- Per-page error resilience 🛡️
- CI-backed quality ✅

Result: Production-ready PDF processing that scales.


⚡ Key Features

Performance

  • 🚀 5-10x faster than sequential with automatic parallelization
  • ⚡ 12,933 ops/sec error handling, 5,575 ops/sec text extraction
  • 💨 Process 50-page PDFs in seconds with multi-core utilization
  • 📦 TypeScript-first with performance-bounded local execution

Developer Experience

  • 🎯 Path Flexibility - Absolute & relative paths, Windows/Unix support (v1.3.0)
  • 🔎 PDF Inspection - Profile PDFs before extraction and get recommended read_pdf arguments for agent workflows
  • 🔎 PDF Search Evidence - Search selected PDF pages with snippets, match offsets, text-item bounding boxes, and provenance
  • 🖼️ Visual Page Evidence - Render selected pages as bounded PNG image parts with JSON provenance and pixel budgets
  • 🔍 Region Crop Evidence - Crop PDF-coordinate regions as bounded PNG image parts for table, figure, chart, and citation verification
  • 🧠 Visual Region Analysis - Send focused crops to a configured local provider and normalize table, chart, formula, figure, and image-description results
  • 🔡 Configured OCR Text Layer - Route rendered pages through an env-configured local OCR command and return normalized text, confidence, words, and provenance
  • 🧾 PDF Text Layer - Optional line and word records with page-level character ranges, best-effort bounding boxes, and provenance
  • 🧭 Agent Document Map - Optional page map that links elements, chunks, layout confidence, safety findings, routing signals, and page geometry
  • 🌳 Document AST - Optional semantic tree with page, section, paragraph, list item, table, and image nodes linked back to evidence IDs
  • 🛡️ Trust Report - Optional consolidated report for prompt-injection text, hidden/off-page signals, layout uncertainty, sparse pages, table warnings, and external links
  • ♿ Accessibility Report - Optional deterministic report for tagged-PDF coverage, structure tree availability, heading roles, image alt-text verifiability, form labels, link labels, and accessibility permissions
  • 🧩 Structured Elements - Optional page-level elements with stable IDs, provenance, and best-effort bounding boxes
  • 📊 Table Intelligence - Optional table quality metrics, inferred header/span hints, sparse-cell warnings, and repeated-header continuation candidates
  • 📐 Layout Diagnostics - Optional page profiles, column signals, and reading-order confidence for agent routing
  • 📝 Markdown Rendering - Optional page-aware Markdown for RAG, summarization, and agent context
  • 🔗 Citation Chunks - Optional page, semantic, size, and table chunks with element IDs and best-effort bounding boxes
  • 🗂️ Document Signals - Optional outlines, page labels, annotations, structure trees, forms, attachments, permissions, and mark info
  • 🖼️ Smart Ordering - Column-aware content ordering improves natural reading flow
  • 🛡️ Type Safe - Full TypeScript with strict mode enabled
  • 📚 Battle-tested - Automated tests, strict TypeScript, and CI validation
  • 🎨 Simple API - inspect_pdf plans extraction, search_pdf finds text evidence, render_page returns visual evidence, extract_regions crops source evidence, analyze_regions enriches visual regions, ocr_pages runs configured OCR, read_pdf performs extraction

📊 Performance Benchmarks

Real-world performance from production testing:

OperationOps/secPerformanceUse Case
Error handling12,933⚡⚡⚡⚡⚡Validation & safety
Extract full text5,575⚡⚡⚡⚡Document analysis
Extract page5,329⚡⚡⚡⚡Single page ops
Multiple pages5,242⚡⚡⚡⚡Batch processing
Metadata only4,912⚡⚡⚡Quick inspection

Parallel Processing Speedup

DocumentSequentialParallelSpeedup
10-page PDF~2s~0.3s5-8x faster
50-page PDF~10s~1s10x faster
100+ pages~20s~2sLinear scaling with CPU cores

Benchmarks vary based on PDF complexity and system resources.


📦 Installation

Claude Code

claude mcp add pdf-reader -- npx @sylphx/pdf-reader-mcp

Claude Desktop

Add to claude_desktop_config.json:

{
  "mcpServers": {
    "pdf-reader": {
      "command": "npx",
      "args": ["@sylphx/pdf-reader-mcp"]
    }
  }
}
📍 Config file locations
  • macOS: ~/Library/Application Support/Claude/claude_desktop_config.json
  • Windows: %APPDATA%\Claude\claude_desktop_config.json
  • Linux: ~/.config/Claude/claude_desktop_config.json

VS Code

code --add-mcp '{"name":"pdf-reader","command":"npx","args":["@sylphx/pdf-reader-mcp"]}'

Cursor

  1. Open Settings → MCP → Add new MCP Server
  2. Select Command type
  3. Enter: npx @sylphx/pdf-reader-mcp

Windsurf

Add to your Windsurf MCP config:

{
  "mcpServers": {
    "pdf-reader": {
      "command": "npx",
      "args": ["@sylphx/pdf-reader-mcp"]
    }
  }
}

Cline

Add to Cline's MCP settings:

{
  "mcpServers": {
    "pdf-reader": {
      "command": "npx",
      "args": ["@sylphx/pdf-reader-mcp"]
    }
  }
}

Warp

  1. Go to Settings → AI → Manage MCP Servers → Add
  2. Command: npx, Args: @sylphx/pdf-reader-mcp

Ontheia

Add the server in Settings → MCP Servers → Add Server with command npx and args @sylphx/pdf-reader-mcp. See Ontheia's compatible MCP servers for the full list.

Smithery (One-click)

npx -y @smithery/cli install @sylphx/pdf-reader-mcp --client claude

Manual Installation

# Quick start - zero installation
npx @sylphx/pdf-reader-mcp

# Or install globally
npm install -g @sylphx/pdf-reader-mcp

🎯 Quick Start

Inspect Before Extraction

Use inspect_pdf when an agent needs to decide how to process an unfamiliar PDF. It samples a bounded number of pages, detects selectable-text versus image-like pages, surfaces document signals, and recommends useful read_pdf arguments without extracting image bytes.

{
  "sources": [{
    "path": "documents/report.pdf"
  }],
  "sample_pages": 5,
  "include_metadata": true
}

Result:

  • PDF profile such as digital_text, scanned_or_image_only, or mixed_text_and_scan
  • Page-level text density, token estimates, and image paint-operation counts
  • Signals for outlines, page labels, forms, attachments, permissions, and structure trees
  • Recommended read_pdf arguments for citation chunks, safety findings, tables, or OCR triage

Search PDF Evidence

Use search_pdf when an agent needs to locate text evidence before deciding whether to read a whole page, crop a region, or cite a result.

{
  "sources": [{
    "path": "documents/report.pdf",
    "pages": "1-20"
  }],
  "query": "risk controls",
  "whole_word": true,
  "max_matches_per_source": 10
}

Response includes:

  • A JSON summary with profile: "pdf_search_results" and effective search options
  • Page numbers, snippets, match offsets, and text-item indexes
  • Best-effort text-item bounding boxes when coordinates are available
  • Per-match provenance so agents can route hits into render_page or extract_regions
  • Bounded defaults: max_pages default 100 and max_matches_per_source default 50

Basic Usage

{
  "sources": [{
    "path": "documents/report.pdf"
  }],
  "include_full_text": true,
  "include_metadata": true,
  "include_page_count": true
}

Result:

  • ✅ Full text content extracted
  • ✅ PDF metadata (author, title, dates)
  • ✅ Total page count
  • ✅ Structured JSON summary for agent workflows

Extract Specific Pages

{
  "sources": [{
    "path": "documents/manual.pdf",
    "pages": "1-5,10,15-20"
  }],
  "include_full_text": true
}

Structured Elements for Agents

{
  "sources": [{
    "path": "documents/report.pdf",
    "pages": "1-3"
  }],
  "include_elements": true,
  "include_metadata": true,
  "include_page_count": true
}

Response includes:

  • Stable element IDs such as p1-text-1
  • Page numbers and provenance for each element
  • Best-effort bounding boxes when coordinates are available
  • Text, image metadata, and table elements without embedding image bytes in the JSON summary
  • Table elements include best-effort table and cell bounding boxes, quality metrics, header/span hints, and continuation candidates when coordinates are available

Agent Document Map

Use include_document_map when an agent needs one navigable PDF structure instead of separate page, element, chunk, layout, and safety outputs.

{
  "sources": [{
    "path": "documents/report.pdf",
    "pages": "1-5"
  }],
  "include_document_map": true,
  "include_full_text": false
}

Response includes:

  • Page records with element IDs, chunk IDs, safety finding indexes, text density, image count, table count, and page geometry
  • Semantic elements and citation chunks derived from the same stable IDs
  • Layout diagnostics and routing signals for low-confidence, sparse, and OCR-needed pages
  • Safety findings linked back to page and element evidence
  • No embedded image bytes inside the JSON document map

Document AST

Use include_document_ast when an agent needs a navigable semantic tree rather than reconstructing document structure from flat text items.

{
  "sources": [{
    "path": "documents/report.pdf",
    "pages": "1-5"
  }],
  "include_document_ast": true,
  "include_full_text": false
}

Response includes:

  • A document_ast root with page, section, paragraph, list item, table, and image nodes
  • Node-level element_ids, chunk_ids, bounding boxes, confidence, and semantic roles where available
  • Table nodes with rows, quality diagnostics, and continuation candidates when tables are detected
  • No forced top-level elements, chunks, or tables output unless those options are requested

Text Layer

Use include_text_layer when an agent needs deterministic line and word references instead of only full text. It exposes page text, line records, word records, page-level character ranges, best-effort bounding boxes, and provenance from the same extracted text-content pass.

{
  "sources": [{
    "path": "documents/report.pdf",
    "pages": "1-5"
  }],
  "include_text_layer": true,
  "include_full_text": false
}

Response includes:

  • A text_layer object with one page record per selected page
  • Line IDs, line text, page-level char_start/char_end, and line bounding boxes when available
  • Word text, page-level character ranges, and estimated word boxes when the line has geometry
  • Summary counts for pages, lines, words, characters, and bbox coverage
  • No forced full_text or raw page_contents output

Trust Report

Use include_trust_report when an agent needs one local risk summary before using extracted PDF content as instructions, evidence, or retrieval context.

{
  "sources": [{
    "path": "documents/report.pdf",
    "pages": "1-5"
  }],
  "include_trust_report": true,
  "include_full_text": false
}

Response includes:

  • Document and page-level risk scores
  • Content safety, layout uncertainty, sparse/scanned-page, table quality, and external-link signals
  • Guidance for when to verify with OCR, page rendering, or region crops
  • No forced top-level safety, layout, annotation, or table outputs unless those options are requested

Accessibility Report

Use include_accessibility_report when an agent needs a deterministic view of tagged-PDF and accessibility-relevant structure before relying on the document for navigation, form filling, summarization, or assisted reading workflows.

{
  "sources": [{
    "path": "documents/report.pdf",
    "pages": "1-5"
  }],
  "include_accessibility_report": true,
  "include_full_text": false
}

Response includes:

  • Document and page-level accessibility scores and grades
  • Tagged-page coverage, structure role counts, heading counts, image counts, link counts, and form field counts
  • Issues for missing mark info, untagged pages, suspect tags, image alt-text verifiability, weak form labels, weak link labels, and missing copy_for_accessibility
  • Guidance for when agents should verify semantics with source files, rendering, or region crops
  • No forced top-level permissions, mark info, annotations, form fields, or structure trees unless those options are requested

Render Page Evidence

Use render_page when an agent needs to inspect the original page image, prepare OCR routing, or verify visual layout without stuffing base64 into JSON.

{
  "sources": [{
    "path": "documents/report.pdf",
    "pages": "1-2"
  }],
  "scale": 2,
  "max_pages": 2
}

Response includes:

  • A JSON summary with page number, render scale, pixel count, byte length, evidence ID, and provenance
  • PNG pages as MCP image content parts when include_image is true
  • Bounded defaults: first page by default, max_pages default 5, and max_pixels_per_page default 16MP
  • No rendered page base64 duplicated inside the first JSON content part

Extract Region Evidence

Use extract_regions when an agent has a table, figure, chart, formula, or citation bounding box and needs a focused crop from the original page.

{
  "sources": [{
    "path": "documents/report.pdf",
    "regions": [{
      "id": "table-1",
      "page": 1,
      "bounding_box": { "left": 72, "bottom": 420, "right": 540, "top": 620 },
      "padding": 8
    }]
  }],
  "scale": 2,
  "max_regions": 20
}

Response includes:

  • A JSON summary with region ID, source bounding box, crop pixel bounds, evidence ID, and provenance
  • PNG region crops as MCP image content parts when include_image is true
  • Bounded defaults: max_regions default 20 and max_pixels_per_page default 16MP
  • No cropped image base64 duplicated inside the first JSON content part

Analyze Visual Regions

Use analyze_regions when an agent has a crop target for a table, chart, formula, figure, or image and wants a normalized local-provider result linked back to source pixels. The provider is configured by environment variables, not by request arguments.

{
  "sources": [{
    "path": "documents/report.pdf",
    "regions": [{
      "id": "chart-1",
      "page": 2,
      "bounding_box": { "left": 72, "bottom": 240, "right": 540, "top": 520 },
      "padding": 8
    }]
  }],
  "scale": 2,
  "max_regions": 10,
  "languages": ["eng"]
}

Response includes:

  • A JSON summary with profile: "region_analysis" and the effective analysis options
  • Region-level kind, description, text, Markdown, confidence, normalized table rows, formula fields, chart data points, warnings, and provenance when supplied by the provider
  • source_crop_evidence_id, source bounding box, crop pixel bounds, and scale for every analyzed region
  • Bounded defaults: max_regions default 20, max_pixels_per_page default 16MP, and timeout_ms default 60 seconds per region
  • No cropped image base64 duplicated inside the JSON response

OCR Selected Pages

Use ocr_pages after inspect_pdf flags scanned or sparse pages, or when an agent needs a text layer from pages that have little selectable text. The server renders bounded page images and passes each temporary PNG to the configured local OCR command.

{
  "sources": [{
    "path": "documents/scanned-report.pdf",
    "pages": "1-3"
  }],
  "scale": 2,
  "max_pages": 3,
  "languages": ["eng"]
}

Response includes:

  • A JSON summary with profile: "ocr_text_layer" and the effective OCR options
  • Page-level OCR text, confidence, optional word bounding boxes, language, and provenance
  • source_render_evidence_id linking each OCR page back to the page render used as OCR input
  • Bounded defaults: max_pages default 5, max_pixels_per_page default 16MP, and timeout_ms default 60 seconds per page
  • No rendered image base64 duplicated inside the JSON response

Markdown for RAG and Summaries

{
  "sources": [{
    "path": "documents/report.pdf",
    "pages": "1-5"
  }],
  "include_markdown": true,
  "include_full_text": false
}

Response includes:

  • Page-aware Markdown sections
  • Text blocks in extraction order
  • Image placeholders with dimensions when images are requested
  • Extracted tables appended as Markdown when include_tables is enabled

Citation-Ready Chunks

{
  "sources": [{
    "path": "documents/report.pdf",
    "pages": "1-5"
  }],
  "include_chunks": true,
  "include_semantic_hints": true,
  "include_tables": true,
  "include_full_text": false
}

Response includes:

  • Stable chunk IDs such as p1-chunk-1
  • Page ranges for each chunk
  • Chunk strategies such as page, semantic, size, and table
  • Semantic headings when heading boundaries are available
  • Element IDs that map back to structured elements
  • Best-effort bounding boxes for source highlighting

Outlines, Forms, Attachments, and Document Signals

{
  "sources": [{
    "path": "documents/spec.pdf",
    "pages": "1-5"
  }],
  "include_outline": true,
  "include_annotations": true,
  "include_page_labels": true,
  "include_permissions": true,
  "include_structure_tree": true,
  "include_form_fields": true,
  "include_attachments": true
}

Response includes, when available:

  • Bookmark/outline trees
  • Page labels such as roman numerals or section labels
  • Link and note annotation summaries with bounding boxes
  • Tagged PDF structure trees for selected pages when available
  • Form field summaries with values, field types, and bounding boxes when available
  • Embedded attachment metadata without returning attachment bytes
  • Permission labels and marking signals

Absolute Paths (v1.3.0+)

// Windows - Both formats work!
{
  "sources": [{
    "path": "C:\\Users\\John\\Documents\\report.pdf"
  }],
  "include_full_text": true
}

// Unix/Mac
{
  "sources": [{
    "path": "/home/user/documents/contract.pdf"
  }],
  "include_full_text": true
}

No more "Absolute paths are not allowed" errors!

Extract Images with Natural Ordering

{
  "sources": [{
    "path": "presentation.pdf",
    "pages": [1, 2, 3]
  }],
  "include_images": true,
  "include_full_text": true
}

Response includes:

  • Text and images in Y-coordinate reading order
  • Base64-encoded images with metadata (width, height, format)
  • Natural reading flow preserved for AI comprehension

Batch Processing

{
  "sources": [
    { "path": "C:\\Reports\\Q1.pdf", "pages": "1-10" },
    { "path": "/home/user/Q2.pdf", "pages": "1-10" },
    { "url": "https://example.com/Q3.pdf" }
  ],
  "include_full_text": true
}

⚡ All PDFs processed in parallel automatically!


✨ Features

Core Capabilities

  • ✅ PDF Inspection - Profile PDFs before extraction, detect low-text/scanned pages, and recommend read_pdf options
  • ✅ Text Extraction - Full document or specific pages with intelligent parsing
  • ✅ PDF Search Evidence - Literal search with page numbers, snippets, match offsets, text-item bounding boxes, and provenance
  • ✅ Image Extraction - Base64-encoded with complete metadata (width, height, format)
  • ✅ Agent Document Map - Pages, elements, chunks, layout diagnostics, safety findings, routing signals, and geometry in one contract
  • ✅ Document AST - Semantic tree for page, section, paragraph, list item, table, and image traversal
  • ✅ Trust Report - Local risk routing for content safety, layout uncertainty, table quality, sparse pages, and external links
  • ✅ Accessibility Report - Tagged-PDF coverage, structure tree, heading, image, form, link, and permission signals
  • ✅ PDF Text Layer - Line records, word records, character ranges, best-effort bounding boxes, and provenance
  • ✅ Configured OCR Text Layer - Optional command-provider OCR over rendered pages, with normalized text, confidence, words, language, and provenance
  • ✅ Structured Elements - Agent-ready elements with stable IDs, provenance, and best-effort bounding boxes
  • ✅ Markdown Output - Page-aware Markdown for RAG, summaries, and context preparation
  • ✅ Citation Chunks - Page, semantic, size, and table chunks with source references for downstream retrieval
  • ✅ Document Signals - Outlines, annotations, structure trees, forms, attachments, page labels, permissions, and mark info when exposed by the PDF
  • ✅ Content Ordering - Column-aware layout preservation for natural reading flow
  • ✅ Metadata Extraction - Author, title, creation date, and custom properties
  • ✅ Page Counting - Fast enumeration without loading full content
  • ✅ Dual Sources - Local files (absolute or relative paths) and HTTP/HTTPS URLs
  • ✅ Batch Processing - Multiple PDFs processed concurrently

Advanced Features

  • ⚡ 5-10x Performance - Parallel page processing with Promise.all
  • 🎯 Smart Pagination - Extract ranges like "1-5,10-15,20"
  • 🖼️ Multi-Format Images - RGB, RGBA, Grayscale with automatic detection
  • 🛡️ Path Flexibility - Windows, Unix, and relative paths all supported (v1.3.0)
  • 🔍 Error Resilience - Per-page error isolation with detailed messages
  • 📏 Large File Support - Efficient streaming and memory management
  • 📝 Type Safe - Full TypeScript with strict mode enabled

🆕 Latest Improvements

Agent Document Map

include_document_map returns a single agent-ready map that links pages, structured elements, citation chunks, layout diagnostics, content safety findings, routing signals, and page geometry. It is designed for agents that need to navigate the original PDF evidence without manually stitching together separate response fields.

The map is performance-bounded: it reuses the same extraction path, keeps image bytes out of JSON, and provides page-level routing signals such as low-confidence pages and pages that likely need OCR.

Accessibility Report

include_accessibility_report returns a deterministic report for tagged-PDF coverage, page structure trees, heading roles, image alt-text verifiability, form field labels, link labels, mark info, and copy_for_accessibility permissions. It gives agents routing guidance without claiming PDF/UA certification or forcing raw structure outputs into top-level JSON.

Configured OCR Text Layer

ocr_pages renders selected PDF pages and sends those temporary PNGs to a local OCR command configured by environment variables. This keeps the default TypeScript package private and dependency-bounded while giving teams a real scanned PDF path when they already run Tesseract, PaddleOCR, a local HTTP shim, or an internal OCR binary. MCP_PDF_OCR_PRESET=tesseract provides a built-in Tesseract command template without bundling an OCR model.

The OCR provider is env-only, not request-controlled. Tool responses normalize provider output into page text, confidence, optional word boxes, language, render evidence IDs, and provenance. Image bytes are not embedded in the JSON response.

Agent-Native PDF Inspection

inspect_pdf adds a bounded planning tool for agent workflows. It samples up to 20 pages per source, counts selectable text and image paint operations, surfaces document-level signals, and returns a recommendation with the next best read_pdf arguments.

Inspection is intentionally low overhead: it does not decode image bytes and it does not perform OCR. When sampled pages look scanned or image-only, the tool marks needs_ocr: true so agents do not mistake an image-based PDF for a text extraction failure. It also reports safe optional-provider readiness for ocr_pages and analyze_regions without exposing local command paths.

Layout Confidence for Agent Routing

include_layout_diagnostics adds deterministic page-level signals for layout profile, reading-order model, confidence, column count, positioned item ratio, and warnings. This helps agents decide when local extraction is safe for RAG and when a page should be routed to a heavier parser, OCR/vision workflow, or human review.

Agent-Ready Structured Output

include_elements adds structured document elements to the JSON response while keeping the existing text, metadata, image, and table outputs backward compatible.

{
  "sources": [{ "path": "report.pdf" }],
  "include_elements": true,
  "include_semantic_hints": true
}

Elements include stable IDs, page numbers, provenance, and best-effort bounding boxes where available. Image bytes stay out of the JSON summary so MCP clients can keep context payloads manageable.

include_semantic_hints adds deterministic heading/list/paragraph hints to text elements, with confidence and signals, without claiming a full semantic parser.

include_markdown adds page-aware Markdown for workflows that need clean text context without manually rebuilding sections from raw page text.

include_html adds an escaped HTML rendering for previews, export workflows, and downstream conversion.

The extraction pipeline also separates distant same-line text into independent segments before ordering, which improves multi-column PDFs without requiring any extra configuration.

include_chunks adds citation-ready chunks with stable IDs, strategy labels, element references, and best-effort bounding boxes for downstream retrieval and citation workflows. When include_semantic_hints is also enabled, chunks split on deterministic heading boundaries; table chunks are emitted when table extraction is requested.

include_outline, include_annotations, include_page_labels, include_page_geometry, include_permissions, include_structure_tree, include_form_fields, and include_attachments expose additional document signals without changing the default response shape.

include_safety_findings adds deterministic findings for common prompt-injection patterns, tiny text, and off-page text so agents can inspect risky document content before using it as instructions.

Absolute Paths Supported

// ✅ Windows
{ "path": "C:\\Users\\John\\Documents\\report.pdf" }
{ "path": "C:/Users/John/Documents/report.pdf" }

// ✅ Unix/Mac
{ "path": "/home/john/documents/report.pdf" }
{ "path": "/Users/john/Documents/report.pdf" }

// ✅ Relative (still works)
{ "path": "documents/report.pdf" }

Other Improvements:

  • 🛡️ Filesystem and HTTP access restrictions for safer deployments
  • 📊 Table extraction with Markdown output
  • 📦 Updated parser resources for CMaps, fonts, WASM decoders, and color profiles
📋 View Full Changelog

v1.2.0 - Content Ordering

  • Y-coordinate based text and image ordering
  • Natural reading flow for AI models
  • Intelligent line grouping

v1.1.0 - Image Extraction & Performance

  • Base64-encoded image extraction
  • 10x speedup with parallel processing
  • Comprehensive test coverage

View Full Changelog →


📖 API Reference

inspect_pdf Tool

Plan PDF extraction before running a heavier read. This is useful for agents that need to choose between metadata review, citation-ready extraction, mixed PDF handling, or OCR-capable workflows.

Parameters

ParameterTypeDescriptionDefault
sourcesArrayList of PDF sources to inspectRequired
sample_pagesnumberMaximum pages to sample per source, capped at 205
include_metadatabooleanInclude PDF metadata and info objectstrue

Response Fields

FieldDescription
profiledigital_text, scanned_or_image_only, mixed_text_and_scan, low_text_or_form, or unknown
sampled_pagesPages used for the bounded inspection sample
page_signalsText chars, text items, token estimate, image paint operations, and scan/low-text flags
document_signalsOutline, labels, permissions, forms, attachments, and structure-tree availability
recommendationSuggested workflow, OCR need, reason, and ready-to-use read_pdf arguments
provider_statusSafe readiness metadata for optional ocr_pages and analyze_regions providers without command paths

render_page Tool

Render selected pages as PNG visual evidence. This gives agents a page image they can inspect or route to OCR/vision workflows while keeping binary content out of the JSON summary.

Parameters

ParameterTypeDescriptionDefault
sourcesArrayList of PDF sources to renderRequired
scalenumberRender scale relative to PDF points, from 0.25 to 42
max_pagesnumberMaximum pages to render per source, capped at 205
max_pixels_per_pagenumberMaximum rendered pixels per page, capped at 64MP16000000
include_imagebooleanReturn PNG pages as MCP image partstrue

Example

{
  "sources": [{ "path": "report.pdf", "pages": "1-2" }],
  "scale": 2,
  "max_pages": 2
}

The first content part is JSON metadata with profile: "page_render_evidence". Rendered PNG data is returned as subsequent MCP image parts and referenced by image_content_index.

search_pdf Tool

Search extracted PDF text using bounded literal matching and return evidence that agents can cite or route into visual tools.

Parameters

ParameterTypeDescriptionDefault
sourcesArrayList of PDF sources to searchRequired
querystringLiteral text query to search forRequired
case_sensitivebooleanUse case-sensitive matchingfalse
whole_wordbooleanMatch only whole words using ASCII word boundariesfalse
max_pagesnumberMaximum pages to search per source, capped at 1000100
max_matches_per_sourcenumberMaximum matches returned per source, capped at 50050
context_charsnumberContext characters around each match, capped at 1000120

Example

{
  "sources": [{ "path": "report.pdf", "pages": "1-20" }],
  "query": "risk controls",
  "whole_word": true,
  "max_matches_per_source": 10
}

The first content part is JSON metadata with profile: "pdf_search_results". Matches include page number, matched text, snippet, match offsets, text-item index, optional text-item bounding box, and provenance. Search uses literal matching only; request payloads do not accept arbitrary regular expressions.

extract_regions Tool

Crop selected PDF-coordinate page regions as PNG visual evidence. This is useful when an agent has bounding boxes from the document map, table detector, or downstream layout workflow and needs focused source evidence.

Parameters

ParameterTypeDescriptionDefault
sourcesArrayList of PDF sources with regions to cropRequired
scalenumberRender scale used before cropping, from 0.25 to 42
max_regionsnumberMaximum regions to crop per source, capped at 10020
max_pixels_per_pagenumberMaximum rendered pixels per page before cropping, capped at 64MP16000000
include_imagebooleanReturn cropped regions as MCP image partstrue

Each region uses PDF coordinates:

{
  "id": "figure-1",
  "page": 1,
  "bounding_box": { "left": 72, "bottom": 420, "right": 540, "top": 620 },
  "padding": 8
}

The first content part is JSON metadata with profile: "region_crop_evidence". Cropped PNG data is returned as subsequent MCP image parts and referenced by image_content_index.

analyze_regions Tool

Analyze selected PDF-coordinate page regions with a configured local provider. This is useful for visual table recognition, chart-to-data enrichment, formula recognition, figure descriptions, and image captions while keeping every result linked to a crop evidence ID.

Parameters

ParameterTypeDescriptionDefault
sourcesArrayList of PDF sources with regions to analyzeRequired
scalenumberRender scale used before cropping and analysis, from 0.25 to 42
max_regionsnumberMaximum regions to analyze per source, capped at 10020
max_pixels_per_pagenumberMaximum rendered pixels per page before cropping, capped at 64MP16000000
timeout_msnumberTimeout per analyzed region in milliseconds, capped at 30000060000
max_output_charsnumberMaximum provider output characters returned per region200000
languagesstring[]Optional language tags passed to the configured provider-

Provider Configuration

VariableDescription
MCP_PDF_REGION_ANALYSIS_COMMANDAbsolute or PATH-resolved command used for visual region analysis. Required to enable analyze_regions.
MCP_PDF_REGION_ANALYSIS_ARGS_JSONOptional JSON string array of command arguments. Must include {input} and may also use {page}, {source}, {region_id}, {evidence_id}, {left}, {bottom}, {right}, {top}, {language}, and {languages} placeholders. Defaults to ["{input}"].

Provider stdout may be plain text or JSON:

{
  "kind": "table",
  "description": "Quarterly revenue table",
  "text": "Q1 revenue...",
  "markdown": "| Quarter | Revenue |",
  "confidence": 0.91,
  "table": {
    "rows": [["Quarter", "Revenue"], ["Q1", "$1.2M"]],
    "confidence": 0.9
  },
  "formula": {
    "latex": "E = mc^2",
    "confidence": 0.82
  },
  "chart": {
    "title": "Revenue by quarter",
    "summary": "Revenue rises across the period.",
    "data_points": [{ "label": "Q1", "value": 1.2 }],
    "confidence": 0.78
  },
  "warnings": ["Low contrast axis labels"]
}

The first content part is JSON metadata with profile: "region_analysis". Each analysis includes source_crop_evidence_id, source bounding box, crop pixel bounds, scale, provider, provenance, and normalized fields supplied by the local provider. The request cannot select an executable.

ocr_pages Tool

Run selected rendered pages through a configured local OCR provider and return a normalized OCR text layer. The provider is configured through environment variables so an MCP request cannot choose arbitrary commands.

Parameters

ParameterTypeDescriptionDefault
sourcesArrayList of PDF sources to OCRRequired
scalenumberRender scale used before OCR, from 0.25 to 42
max_pagesnumberMaximum pages to OCR per source, capped at 205
max_pixels_per_pagenumberMaximum rendered pixels per page before OCR, capped at 64MP16000000
timeout_msnumberTimeout per OCR page in milliseconds, capped at 30000060000
max_output_charsnumberMaximum OCR text characters returned per page200000
languagesstring[]Optional OCR language tags passed to the configured provider-

Provider Configuration

VariableDescription
MCP_PDF_OCR_PRESETOptional built-in command template. Supported value: tesseract.
MCP_PDF_OCR_COMMANDAbsolute or PATH-resolved command used for OCR. Required unless MCP_PDF_OCR_PRESET is set. Overrides the preset command when both are set.
MCP_PDF_OCR_ARGS_JSONOptional JSON string array of command arguments. Must include {input} and may also use {page}, {source}, {language}, {languages}, and {languages_tesseract} placeholders. Defaults to the preset template or ["{input}"].

Provider stdout may be plain text or JSON:

{
  "text": "Recognized text",
  "confidence": 0.93,
  "language": "eng",
  "words": [{
    "text": "Recognized",
    "confidence": 0.95,
    "bounding_box": { "left": 10, "bottom": 20, "right": 90, "top": 40 }
  }]
}

The first content part is JSON metadata with profile: "ocr_text_layer". OCR results reference the render evidence ID used to create each temporary page image. The default package does not bundle an OCR model or call a cloud OCR service.

read_pdf Tool

The extraction tool that handles PDF content, structure, citations, images, tables, and document signals.

Parameters

ParameterTypeDescriptionDefault
sourcesArrayList of PDF sources to processRequired
include_full_textbooleanExtract full text contentfalse
include_metadatabooleanExtract PDF metadatatrue
include_page_countbooleanInclude total page counttrue
include_imagesbooleanExtract embedded imagesfalse
include_tablesbooleanDetect tables with rows, cell metadata, confidence, quality diagnostics, inferred spans, continuation candidates, and best-effort geometryfalse
include_document_mapbooleanInclude an agent document map that links pages, elements, chunks, layout diagnostics, safety findings, routing signals, and page geometryfalse
include_document_astbooleanInclude a semantic document AST with page, section, paragraph, list item, table, and image nodes linked to element/chunk evidencefalse
include_trust_reportbooleanInclude a consolidated trust report for content safety, layout uncertainty, sparse/scanned pages, table quality, and external linksfalse
include_accessibility_reportbooleanInclude a deterministic accessibility report for tagged-PDF coverage, structure trees, headings, images, forms, links, and accessibility permissionsfalse
include_elementsbooleanInclude structured document elements for agent workflowsfalse
include_semantic_hintsbooleanInclude deterministic heading/list/paragraph hints on text elementsfalse
include_markdownbooleanInclude page-aware Markdown for RAG and summarizationfalse
include_htmlbooleanInclude escaped page-aware HTML for preview/export workflowsfalse
include_chunksbooleanInclude page, semantic, size, and table chunks with source referencesfalse
include_text_layerbooleanInclude line and word records with page-level character ranges, best-effort bounding boxes, and provenancefalse
include_layout_diagnosticsbooleanInclude page layout profiles, reading-order confidence, column signals, and warningsfalse
include_outlinebooleanInclude PDF outline/bookmarks when availablefalse
include_annotationsbooleanInclude safe annotation summaries for selected pagesfalse
include_page_labelsbooleanInclude PDF page labels when availablefalse
include_page_geometrybooleanInclude page viewport geometry and PDF view boxesfalse
include_permissionsbooleanInclude permission labels and mark info when availablefalse
include_structure_treebooleanInclude tagged PDF structure trees for selected pages when availablefalse
include_form_fieldsbooleanInclude PDF form field summaries when availablefalse
include_attachmentsbooleanInclude embedded attachment metadata without attachment bytesfalse
include_safety_findingsbooleanInclude deterministic content safety findings for agent workflowsfalse

Source Object

{
  path?: string;        // Local file path (absolute or relative)
  url?: string;         // HTTP/HTTPS URL to PDF
  pages?: string | number[];  // Pages to extract: "1-5,10" or [1,2,3]
}

Examples

Metadata only (fast):

{
  "sources": [{ "path": "large.pdf" }],
  "include_metadata": true,
  "include_page_count": true,
  "include_full_text": false
}

From URL:

{
  "sources": [{
    "url": "https://arxiv.org/pdf/2301.00001.pdf"
  }],
  "include_full_text": true
}

Page ranges:

{
  "sources": [{
    "path": "manual.pdf",
    "pages": "1-5,10-15,20"  // Pages 1,2,3,4,5,10,11,12,13,14,15,20
  }]
}

Structured elements:

{
  "sources": [{ "path": "report.pdf", "pages": "1-3" }],
  "include_elements": true,
  "include_metadata": true
}

Elements are designed for agent workflows that need stable page references, provenance, and best-effort coordinates for citation-ready downstream processing.

Agent document map:

{
  "sources": [{ "path": "report.pdf", "pages": "1-5" }],
  "include_document_map": true,
  "include_full_text": false
}

The document map is designed for agents that need one navigable structure for pages, elements, chunks, layout confidence, safety findings, routing signals, and page geometry without embedding image bytes in JSON.


🔧 Advanced Usage

📐 Column-Aware Content Ordering

Content is returned in natural reading order using Y-coordinates plus deterministic column segmentation:

Document Layout:
┌─────────────────────┐
│ [Title]       Y:100 │
│ [Image]       Y:150 │
│ [Text]        Y:400 │
│ [Photo A]     Y:500 │
│ [Photo B]     Y:550 │
└─────────────────────┘

Response Order:
[
  { type: "text", text: "Title..." },
  { type: "image", data: "..." },
  { type: "text", text: "..." },
  { type: "image", data: "..." },
  { type: "image", data: "..." }
]

Benefits:

  • AI understands spatial relationships
  • Natural document comprehension
  • Perfect for vision-enabled models
  • Automatic multi-line text grouping
  • Better ordering for common two-column PDFs
🖼️ Image Extraction

Enable extraction:

{
  "sources": [{ "path": "manual.pdf" }],
  "include_images": true
}

Response format:

{
  "images": [{
    "page": 1,
    "index": 0,
    "width": 1920,
    "height": 1080,
    "format": "rgb",
    "data": "base64-encoded-png..."
  }]
}

Supported formats: RGB, RGBA, Grayscale Auto-detected: JPEG, PNG, and other embedded formats

📂 Path Configuration

Absolute paths (v1.3.0+) - Direct file access:

{ "path": "C:\\Users\\John\\file.pdf" }
{ "path": "/home/user/file.pdf" }

Relative paths - Workspace files:

{ "path": "docs/report.pdf" }
{ "path": "./2024/Q1.pdf" }

Configure working directory:

{
  "mcpServers": {
    "pdf-reader-mcp": {
      "command": "npx",
      "args": ["@sylphx/pdf-reader-mcp"],
      "cwd": "/path/to/documents"
    }
  }
}
📊 Large PDF Strategies

Strategy 1: Page ranges

{ "sources": [{ "path": "big.pdf", "pages": "1-20" }] }

Strategy 2: Progressive loading

// Step 1: Get page count
{ "sources": [{ "path": "big.pdf" }], "include_full_text": false }

// Step 2: Extract sections
{ "sources": [{ "path": "big.pdf", "pages": "50-75" }] }

Strategy 3: Parallel batching

{
  "sources": [
    { "path": "big.pdf", "pages": "1-50" },
    { "path": "big.pdf", "pages": "51-100" }
  ]
}

🔒 Security & Sandboxing

By default the server can read any local file the host process can access and fetch any HTTP(S) URL. When running outside a sandbox you should restrict it to a specific working set.

Restricting filesystem access

Use --allow-dir (repeatable) or the MCP_PDF_ALLOWED_DIRS env var (: or , separated). Once set, all path sources must resolve inside one of the allowed directories — relative paths, absolute paths, and .. traversal are all checked after resolution.

# CLI flags
npx @sylphx/pdf-reader-mcp --allow-dir=/srv/pdfs --allow-dir=/data/reports

# Environment
MCP_PDF_ALLOWED_DIRS="/srv/pdfs:/data/reports" npx @sylphx/pdf-reader-mcp
{
  "mcpServers": {
    "pdf-reader": {
      "command": "npx",
      "args": ["@sylphx/pdf-reader-mcp", "--allow-dir=/srv/pdfs"]
    }
  }
}

Disabling or restricting HTTP

# Block all URL sources
npx @sylphx/pdf-reader-mcp --no-http
MCP_PDF_ALLOW_HTTP=false npx @sylphx/pdf-reader-mcp

# Allowlist hosts (everything else rejected)
npx @sylphx/pdf-reader-mcp --allow-host=cdn.example.com --allow-host=files.internal
MCP_PDF_ALLOWED_HOSTS="cdn.example.com,files.internal" npx @sylphx/pdf-reader-mcp
SettingCLI flagEnvironment variableDefault
Filesystem allowlist--allow-dir=<path> (repeatable)MCP_PDF_ALLOWED_DIRS (: or , separated)unrestricted
Disable HTTP--no-httpMCP_PDF_ALLOW_HTTP=falseenabled
HTTP host allowlist--allow-host=<host> (repeatable)MCP_PDF_ALLOWED_HOSTS (, separated)any host

Denied requests fail fast with an Access denied error before any disk read or network call.


🔧 Troubleshooting

"Absolute paths are not allowed"

Solution: Upgrade to v1.3.0+

npm update @sylphx/pdf-reader-mcp

Restart your MCP client completely.


"File not found"

Causes:

  • File doesn't exist at path
  • Wrong working directory
  • Permission issues

Solutions:

Use absolute path:

{ "path": "C:\\Full\\Path\\file.pdf" }

Or configure cwd:

{
  "pdf-reader-mcp": {
    "command": "npx",
    "args": ["@sylphx/pdf-reader-mcp"],
    "cwd": "/path/to/docs"
  }
}

"No tools showing up"

Solution:

npm cache clean --force
rm -rf node_modules package-lock.json
npm install @sylphx/pdf-reader-mcp@latest

Restart MCP client completely.


🌐 HTTP Transport (Remote Access)

By default, PDF Reader MCP uses stdio transport for local use. You can also run it as an HTTP server for remote access from multiple machines.

Quick Start

# Run as HTTP server on port 8080
MCP_TRANSPORT=http npx @sylphx/pdf-reader-mcp

Environment Variables

VariableDefaultDescription
MCP_TRANSPORTstdioTransport type: stdio or http
MCP_HTTP_PORT8080HTTP server port
MCP_HTTP_HOST0.0.0.0HTTP server hostname
MCP_API_KEY-Optional API key for authentication
MCP_PDF_OCR_PRESET-Optional OCR preset. Supported value: tesseract
MCP_PDF_OCR_COMMAND-Optional local OCR command used by ocr_pages
MCP_PDF_OCR_ARGS_JSON["{input}"]Optional JSON string array of OCR command arguments. Must include {input}.
MCP_PDF_REGION_ANALYSIS_COMMAND-Optional local visual-region analysis command used by analyze_regions
MCP_PDF_REGION_ANALYSIS_ARGS_JSON["{input}"]Optional JSON string array of region analysis command arguments. Must include {input}.

Docker Deployment

FROM oven/bun:1
WORKDIR /app
RUN bun add @sylphx/pdf-reader-mcp
ENV MCP_TRANSPORT=http
ENV MCP_HTTP_PORT=8080
EXPOSE 8080
CMD ["bun", "node_modules/@sylphx/pdf-reader-mcp/dist/index.js"]

MCP Client Configuration (HTTP)

{
  "servers": {
    "pdf-reader": {
      "type": "http",
      "url": "https://your-server.com/mcp",
      "headers": {
        "X-API-Key": "your-api-key"
      }
    }
  }
}

Endpoints

EndpointMethodDescription
/mcpPOSTJSON-RPC endpoint
/mcp/healthGETHealth check

🏗️ Architecture

Tech Stack

ComponentTechnology
RuntimeNode.js 22+ ESM
PDF EnginePDF.js (Mozilla)
ValidationVex + JSON Schema
ProtocolMCP SDK
LanguageTypeScript (strict)
TestingBun test suite
QualityBiome (50x faster)
CI/CDGitHub Actions

Design Principles

  • 🔒 Security First - Flexible paths with secure defaults
  • 🎯 Simple Interface - One tool, all operations
  • ⚡ Performance - Parallel processing, efficient memory
  • 🛡️ Reliability - Per-page isolation, detailed errors
  • 🧪 Quality - Automated tests, strict TypeScript, and CI validation
  • 📝 Type Safety - No any types, strict mode
  • 🔄 Backward Compatible - Smooth upgrades always

🧪 Development

Setup & Scripts

Prerequisites:

  • Node.js >= 22.13.0 (required by pdfjs-dist v6)
  • Bun (this repo uses bun@1.3.1)

Setup:

git clone https://github.com/SylphxAI/pdf-reader-mcp.git
cd pdf-reader-mcp
bun install && bun run build

Scripts:

bun run build        # Build with bunup
bun test             # Run the test suite
bun run test:cov     # Run coverage
bun run check        # Lint + format
bun run check:fix    # Auto-fix
bun run benchmark    # Reproducible local performance benchmark

Quality:

  • ✅ Automated tests
  • ✅ Coverage reporting
  • ✅ Strict TypeScript
  • ✅ Zero lint errors
  • ✅ Strict TypeScript
Contributing

Quick Start:

  1. Fork repository
  2. Create branch: git checkout -b feature/awesome
  3. Make changes: bun test
  4. Format: bun run check:fix
  5. Commit: Use Conventional Commits
  6. Open PR

Commit Format:

feat(images): add WebP support
fix(paths): handle UNC paths
docs(readme): update examples

See CONTRIBUTING.md


📚 Documentation

  • 📖 Full Docs - Complete guides
  • 🚀 Getting Started - Quick start
  • 📘 API Reference - Detailed API
  • 🏗️ Design - Architecture
  • ⚡ Performance - Benchmarks
  • 🔍 Comparison - vs. alternatives

🗺️ Roadmap

✅ Completed

  • Image extraction (v1.1.0)
  • 5-10x parallel speedup (v1.1.0)
  • Y-coordinate ordering (v1.2.0)
  • Absolute paths (v1.3.0)
  • Table extraction
  • Structured element output
  • Semantic document AST
  • PDF trust report
  • PDF accessibility report
  • Table quality diagnostics, inferred cell spans, and continuation candidates
  • Markdown rendering
  • Citation-ready page, semantic, size, and table chunks
  • MCP-native PDF search with snippets and bbox provenance
  • Outlines, annotations, structure trees, form fields, attachment metadata, page labels, and permission signals
  • Column-aware ordering for common multi-column PDFs
  • Layout diagnostics with reading-order confidence
  • Configured local OCR provider for scanned-page text layers
  • Tesseract OCR provider preset without bundling OCR model assets
  • Configured local visual region analysis provider for table, chart, formula, figure, and image-description enrichment
  • Quality evals for semantic chunks, table ordering, renderers, and safety findings
  • Filesystem and HTTP access restrictions

🚀 Next

  • Richer semantic layout detection
  • Fixture-backed OCR and visual-region accuracy benchmarks
  • Engine-specific visual region provider presets
  • Optional advanced parser engines
  • 100+ MB streaming
  • Advanced caching

Vote at Discussions


🏆 Recognition

Featured on:

  • Smithery - MCP directory
  • Glama - AI marketplace
  • MseeP.ai - Security validated

Local-first • Agent-ready • Battle-tested


🤝 Support

GitHub Issues Discord

  • 🐛 Bug Reports
  • 💬 Discussions
  • 📖 Documentation
  • 📧 Email

Show Your Support: ⭐ Star • 👀 Watch • 🐛 Report bugs • 💡 Suggest features • 🔀 Contribute


📊 Stats

Stars Forks Downloads Contributors

CI-backed quality • Structured extraction • Production ready


📄 License

MIT © Sylphx


🙏 Credits

Built with:

  • PDF.js - Mozilla PDF engine
  • Bun - Fast JavaScript runtime

Special thanks to the open source community ❤️

Powered by Sylphx

This project uses the following @sylphx packages:

  • @sylphx/mcp-server-sdk - MCP server framework
  • @sylphx/vex - Schema validation
  • @sylphx/biome-config - Biome configuration
  • @sylphx/tsconfig - TypeScript configuration

Star History

Star History Chart


Built with ❤️ by Sylphx
Featured
CodeRabbit
CodeRabbit
AI writes the code. CodeRabbit catches the slop.
Try For Free →
Make your agent a DeFi expert
Make your agent a DeFi expert
Agent, run crypto. Access onchain data & trade routes via 1inch.
Install now →
Make money from your Skills
Make money from your Skills
On Capafy, your Skill runs online 24/7 as an agent product, and you get paid every time someone uses it.
Start earning →
AppSignal
AppSignal
Monitor with ease. Code with confidence.
Start Free Trial →
Vibe Prospecting MCPVibe Prospecting MCP
Vibe Prospecting MCP
Connect Claude to +800M contacts, +150M companies. Find & Enrich leads in chat.
Try For Free →
Context.devContext.dev
Context.dev
Integrate web data into your AI product. One API to scrape website & brand data.
Get API Key Now →
Categories
Documents & KnowledgeSearch & Web Crawling
UpdatedDec 15, 2025
View on GitHub

Related Documents & Knowledge MCP Servers

View all →
Pdf Document Mcp

csoai-org/pdf-document-mcp

pdf-document-mcp MCP server by MEOK AI Labs
Mcp Document Converter

xt765/mcp-document-converter

Convert PDF, DOCX, HTML, Markdown, and Text for AI assistant context injection.
10
Markdown Formatter

io.github.xjtlumedia/markdown-formatter

AI Answer Copier — Convert Markdown to PDF, DOCX, HTML, LaTeX, CSV, JSON, XML, XLSX, RTF, PNG
3
Better Notion

io.github.ai-aviate/better-notion

Operate Notion with a single Markdown document — read, create, and update pages in one call.
2
Notion

suekou/mcp-notion-server

Notion MCP Server enables LLMs to access Notion workspaces with optional Markdown conversion to save tokens.
892
Docx

meterlong/mcp-doc

A powerful Word document processing service based on FastMCP, enabling AI assistants to create, edit, and manage docx files with full formatting support. Preserves original styles when editing content. 基于FastMCP的强大Word文档处理服务,使AI助手能够创建、编辑和管理docx文件,支持完整的格式设置功能。在编辑内容时能够保留原始样式和格式,实现精确的文档操作。
185