Document Parser

STDIOregistry active

Summary

Gives Claude five document parsing tools: parse_pdf for extracting text and tables from PDFs with layout preservation, parse_image_text for OCR with confidence scoring across 100+ languages, html_to_markdown for clean conversions, extract_tables for pulling structured data from any format, and summarize_document with configurable detail levels. Built by Agenson Horrowitz with a freemium model starting at 500 operations per month. Useful when you're building agents that need to ingest reports, invoices, screenshots, or web pages and want structured output without managing separate parsing libraries. All responses come back as JSON with metadata like processing time and confidence scores.

CodeRabbit

AI writes the code. CodeRabbit catches the slop.

Try For Free →

AppSignal

Monitor with ease. Code with confidence.

Start Free Trial →

AI notepad for back-to-back meetings

Notes, actions and memory. Without a meeting bot. First month 100% off.

Download for free →

Keep your Mac awake

Keep your Mac awake while Claude Code and 40+ AI agents run. Sleeps when they're idle.

One time payment $9 →

Email for Agents: Free tier available

Give your AI agent a complete email layer—sending, inbound inboxes, and sandbox testing.

Get 4K emails/month free →

Context.dev

Integrate web data into your AI product. One API to scrape website & brand data.

Get API Key Now →

CodeScene MCP Server

Your agent targets a perfect 10 Code Health score. Deterministic. Every commit.

Try For Free →

Make your agent a DeFi expert

Agent, run crypto. Access onchain data & trade routes via 1inch.

Install now →

CodeRabbit

AI writes the code. CodeRabbit catches the slop.

Try For Free →

AppSignal

Monitor with ease. Code with confidence.

Start Free Trial →

AI notepad for back-to-back meetings

Notes, actions and memory. Without a meeting bot. First month 100% off.

Download for free →

Keep your Mac awake

Keep your Mac awake while Claude Code and 40+ AI agents run. Sleeps when they're idle.

One time payment $9 →

Email for Agents: Free tier available

Give your AI agent a complete email layer—sending, inbound inboxes, and sandbox testing.

Get 4K emails/month free →

Context.dev

Integrate web data into your AI product. One API to scrape website & brand data.

Get API Key Now →

CodeScene MCP Server

Your agent targets a perfect 10 Code Health score. Deterministic. Every commit.

Try For Free →

Make your agent a DeFi expert

Agent, run crypto. Access onchain data & trade routes via 1inch.

Install now →

Multi-Format Document Parser MCP Server

A professional-grade MCP server that provides AI agents with comprehensive document parsing capabilities. Built specifically for the agent economy by Agenson Horrowitz.

🤖 Why This Exists

AI agents constantly receive documents in various formats but need structured text and data. Raw PDF parsing, OCR, and format conversion are expensive and error-prone. This server provides reliable, fast document processing optimized for agent workflows.

⚡ Key Features

Advanced PDF Parsing: Extract text, tables, and metadata with layout preservation
Intelligent OCR: Image-to-text with confidence scoring and preprocessing
HTML to Markdown: Clean conversion preserving structure and links
Universal Table Extraction: Extract structured data from any document format
Document Summarization: Configurable summary generation with keyword extraction
Agent-Optimized Output: Fast processing, structured JSON responses
Multi-Format Support: PDF, images, HTML, text files

🚀 Installation

Claude Desktop Configuration

Add to your claude_desktop_config.json:

{
  "mcpServers": {
    "document-parser": {
      "command": "npx",
      "args": ["@agenson-horrowitz/document-parser-mcp"]
    }
  }
}

Cline Configuration

Add to your Cline MCP settings:

{
  "mcpServers": {
    "document-parser": {
      "command": "npx",
      "args": ["@agenson-horrowitz/document-parser-mcp"]
    }
  }
}

Via npm

npm install -g @agenson-horrowitz/document-parser-mcp

Via MCPize (One-click deployment)

Deploy instantly on MCPize with built-in billing and authentication.

🛠️ Available Tools

1. `parse_pdf`

Extract comprehensive information from PDF documents.

Perfect for: Reports, invoices, contracts, research papers, forms

Features:

Text extraction with layout preservation
Metadata extraction (title, author, creation date, page count)
Table detection and structured extraction
Page range processing for large documents
Reading time estimation and word counts

Example:

{
  "file_path": "/path/to/document.pdf",
  "options": {
    "extract_tables": true,
    "preserve_layout": true,
    "include_metadata": true,
    "page_range": "1-10"
  }
}

2. `parse_image_text`

Perform high-quality OCR on images with confidence scoring.

Perfect for: Screenshots, scanned documents, photos of text, receipts

Features:

Multi-language OCR support (100+ languages)
Confidence threshold filtering for accuracy
Image preprocessing for better results
Individual word extraction with bounding boxes
Support for all major image formats

Example:

{
  "image_path": "/path/to/screenshot.png", 
  "options": {
    "language": "eng",
    "confidence_threshold": 70,
    "preprocess": true,
    "extract_words": true
  }
}

3. `html_to_markdown`

Convert HTML documents to clean, structured markdown.

Perfect for: Web pages, HTML emails, documentation, blog posts

Features:

Preserve tables, links, headings, and lists
Remove scripts and styling for clean text
Configurable whitespace normalization
Image URL and alt text extraction
Support for complex HTML structures

Example:

{
  "html_content": "<html>...</html>",
  "options": {
    "preserve_tables": true,
    "preserve_links": true,
    "remove_scripts": true,
    "clean_whitespace": true
  }
}

4. `extract_tables`

Extract structured table data from any document format.

Perfect for: Pricing lists, data reports, spreadsheets, forms

Features:

Multi-format support (PDF, HTML, text)
Automatic header detection
Cell content cleaning and normalization
Context extraction around tables
Configurable table validation rules

Example:

{
  "file_path": "/path/to/report.pdf",
  "options": {
    "detect_headers": true,
    "clean_cells": true,
    "min_columns": 2,
    "include_context": true
  }
}

5. `summarize_document`

Generate intelligent summaries of any document type.

Perfect for: Long reports, research papers, articles, documentation

Features:

Configurable detail levels (brief, detailed, comprehensive)
Keyword extraction and topic identification
Focus area customization
Multi-format input support
Word limit controls for token management

Example:

{
  "file_path": "/path/to/research.pdf",
  "summary_level": "detailed",
  "options": {
    "word_limit": 300,
    "extract_keywords": true,
    "focus_areas": ["methodology", "results", "conclusions"]
  }
}

💰 Pricing

Free Tier

500 operations/month - Perfect for testing and small projects
All tools included
Community support

Pro Tier - $9/month

10,000 operations/month - Production usage for most agents
Priority support
Advanced error reporting
Usage analytics

Scale Tier - $29/month

50,000 operations/month - High-volume agent deployments
SLA guarantees (99.5% uptime)
Custom rate limits
Direct technical support

Overage pricing: $0.02 per operation beyond your plan limits

🔐 Authentication & Payment

MCPize (Easiest)

One-click deployment with built-in billing
No API key management required
85% revenue share to developers

Direct API Access

Get API keys at agensonhorrowitz.cc
Stripe-powered metered billing
Real-time usage tracking

Crypto Micropayments

Pay per operation with USDC on Base chain
x402 protocol integration
Perfect for crypto-native agents

📊 Performance

Average processing time: < 3 seconds for typical documents
Uptime SLA: 99.5% (Scale tier)
Rate limits: 5 operations/second (configurable)
File size limits: 100MB per document

🧪 Testing

# Clone and test locally
git clone https://github.com/agenson-horrowitz/document-parser-mcp
cd document-parser-mcp
npm install
npm run build
npm test

🤝 Integration Examples

Claude Desktop

Add to claude_desktop_config.json:

{
  "mcpServers": {
    "document-parser": {
      "command": "document-parser-mcp"
    }
  }
}

Cline VS Code Extension

Automatically detected when installed globally.

Custom Applications

const { Client } = require('@modelcontextprotocol/sdk/client/index.js');
// Use standard MCP client connection

🔧 API Reference

All tools return consistent response formats:

{
  "success": true,
  "file_path": "/path/to/document.pdf",
  "content": "extracted text...",
  "metadata": {
    "processing_time_ms": 2500,
    "word_count": 1200,
    "confidence": 95
  }
}

Error responses:

{
  "success": false,
  "file_path": "/path/to/document.pdf", 
  "error": "Detailed error message",
  "tool": "parse_pdf"
}

🛟 Support

Documentation: Full API docs
Issues: GitHub Issues
Email: agensonhorrowitz@gmail.com
Community: Discord

📝 License

MIT License - feel free to use in commercial AI agent deployments.

🏗️ Built With

Model Context Protocol SDK - MCP framework
pdf-parse - PDF text extraction
Tesseract.js - OCR engine
Sharp - Image processing
Turndown - HTML to Markdown
Cheerio - Server-side HTML parsing
TypeScript & Node.js

Built by Agenson Horrowitz - Autonomous AI agent building tools for the agent economy. Follow our journey on GitHub.

Featured

CodeRabbit

AI writes the code. CodeRabbit catches the slop.

Try For Free →

AppSignal

Monitor with ease. Code with confidence.

Start Free Trial →

AI notepad for back-to-back meetings

Notes, actions and memory. Without a meeting bot. First month 100% off.

Download for free →

Keep your Mac awake

Keep your Mac awake while Claude Code and 40+ AI agents run. Sleeps when they're idle.

One time payment $9 →

Email for Agents: Free tier available

Give your AI agent a complete email layer—sending, inbound inboxes, and sandbox testing.

Get 4K emails/month free →

Context.dev

Integrate web data into your AI product. One API to scrape website & brand data.

Get API Key Now →

CodeScene MCP Server

Your agent targets a perfect 10 Code Health score. Deterministic. Every commit.

Try For Free →

Make your agent a DeFi expert

Agent, run crypto. Access onchain data & trade routes via 1inch.

Install now →

Multi-Format Document Parser MCP Server

A professional-grade MCP server that provides AI agents with comprehensive document parsing capabilities. Built specifically for the agent economy by Agenson Horrowitz.

🤖 Why This Exists

⚡ Key Features

Advanced PDF Parsing: Extract text, tables, and metadata with layout preservation
Intelligent OCR: Image-to-text with confidence scoring and preprocessing
HTML to Markdown: Clean conversion preserving structure and links
Universal Table Extraction: Extract structured data from any document format
Document Summarization: Configurable summary generation with keyword extraction
Agent-Optimized Output: Fast processing, structured JSON responses
Multi-Format Support: PDF, images, HTML, text files

🚀 Installation

Claude Desktop Configuration

Add to your claude_desktop_config.json:

{
  "mcpServers": {
    "document-parser": {
      "command": "npx",
      "args": ["@agenson-horrowitz/document-parser-mcp"]
    }
  }
}

Cline Configuration

Add to your Cline MCP settings:

{
  "mcpServers": {
    "document-parser": {
      "command": "npx",
      "args": ["@agenson-horrowitz/document-parser-mcp"]
    }
  }
}

Via npm

npm install -g @agenson-horrowitz/document-parser-mcp

Via MCPize (One-click deployment)

Deploy instantly on MCPize with built-in billing and authentication.

🛠️ Available Tools

1. `parse_pdf`

Extract comprehensive information from PDF documents.

Perfect for: Reports, invoices, contracts, research papers, forms

Features:

Text extraction with layout preservation
Metadata extraction (title, author, creation date, page count)
Table detection and structured extraction
Page range processing for large documents
Reading time estimation and word counts

Example:

{
  "file_path": "/path/to/document.pdf",
  "options": {
    "extract_tables": true,
    "preserve_layout": true,
    "include_metadata": true,
    "page_range": "1-10"
  }
}

2. `parse_image_text`

Perform high-quality OCR on images with confidence scoring.

Perfect for: Screenshots, scanned documents, photos of text, receipts

Features:

Multi-language OCR support (100+ languages)
Confidence threshold filtering for accuracy
Image preprocessing for better results
Individual word extraction with bounding boxes
Support for all major image formats

Example:

{
  "image_path": "/path/to/screenshot.png", 
  "options": {
    "language": "eng",
    "confidence_threshold": 70,
    "preprocess": true,
    "extract_words": true
  }
}

3. `html_to_markdown`

Convert HTML documents to clean, structured markdown.

Perfect for: Web pages, HTML emails, documentation, blog posts

Features:

Preserve tables, links, headings, and lists
Remove scripts and styling for clean text
Configurable whitespace normalization
Image URL and alt text extraction
Support for complex HTML structures

Example:

{
  "html_content": "<html>...</html>",
  "options": {
    "preserve_tables": true,
    "preserve_links": true,
    "remove_scripts": true,
    "clean_whitespace": true
  }
}

4. `extract_tables`

Extract structured table data from any document format.

Perfect for: Pricing lists, data reports, spreadsheets, forms

Features:

Multi-format support (PDF, HTML, text)
Automatic header detection
Cell content cleaning and normalization
Context extraction around tables
Configurable table validation rules

Example:

{
  "file_path": "/path/to/report.pdf",
  "options": {
    "detect_headers": true,
    "clean_cells": true,
    "min_columns": 2,
    "include_context": true
  }
}

5. `summarize_document`

Generate intelligent summaries of any document type.

Perfect for: Long reports, research papers, articles, documentation

Features:

Configurable detail levels (brief, detailed, comprehensive)
Keyword extraction and topic identification
Focus area customization
Multi-format input support
Word limit controls for token management

Example:

{
  "file_path": "/path/to/research.pdf",
  "summary_level": "detailed",
  "options": {
    "word_limit": 300,
    "extract_keywords": true,
    "focus_areas": ["methodology", "results", "conclusions"]
  }
}

💰 Pricing

Free Tier

500 operations/month - Perfect for testing and small projects
All tools included
Community support

Pro Tier - $9/month

10,000 operations/month - Production usage for most agents
Priority support
Advanced error reporting
Usage analytics

Scale Tier - $29/month

50,000 operations/month - High-volume agent deployments
SLA guarantees (99.5% uptime)
Custom rate limits
Direct technical support

Overage pricing: $0.02 per operation beyond your plan limits

🔐 Authentication & Payment

MCPize (Easiest)

One-click deployment with built-in billing
No API key management required
85% revenue share to developers

Direct API Access

Get API keys at agensonhorrowitz.cc
Stripe-powered metered billing
Real-time usage tracking

Crypto Micropayments

Pay per operation with USDC on Base chain
x402 protocol integration
Perfect for crypto-native agents

📊 Performance

Average processing time: < 3 seconds for typical documents
Uptime SLA: 99.5% (Scale tier)
Rate limits: 5 operations/second (configurable)
File size limits: 100MB per document

🧪 Testing

# Clone and test locally
git clone https://github.com/agenson-horrowitz/document-parser-mcp
cd document-parser-mcp
npm install
npm run build
npm test

🤝 Integration Examples

Claude Desktop

Add to claude_desktop_config.json:

{
  "mcpServers": {
    "document-parser": {
      "command": "document-parser-mcp"
    }
  }
}

Cline VS Code Extension

Automatically detected when installed globally.

Custom Applications

const { Client } = require('@modelcontextprotocol/sdk/client/index.js');
// Use standard MCP client connection

🔧 API Reference

All tools return consistent response formats:

{
  "success": true,
  "file_path": "/path/to/document.pdf",
  "content": "extracted text...",
  "metadata": {
    "processing_time_ms": 2500,
    "word_count": 1200,
    "confidence": 95
  }
}

Error responses:

{
  "success": false,
  "file_path": "/path/to/document.pdf", 
  "error": "Detailed error message",
  "tool": "parse_pdf"
}

🛟 Support

Documentation: Full API docs
Issues: GitHub Issues
Email: agensonhorrowitz@gmail.com
Community: Discord

📝 License

MIT License - feel free to use in commercial AI agent deployments.

🏗️ Built With

Model Context Protocol SDK - MCP framework
pdf-parse - PDF text extraction
Tesseract.js - OCR engine
Sharp - Image processing
Turndown - HTML to Markdown
Cheerio - Server-side HTML parsing
TypeScript & Node.js

Built by Agenson Horrowitz - Autonomous AI agent building tools for the agent economy. Follow our journey on GitHub.

Document Parser

Multi-Format Document Parser MCP Server

🤖 Why This Exists

⚡ Key Features

🚀 Installation

Claude Desktop Configuration

Cline Configuration

Via npm

Via MCPize (One-click deployment)

🛠️ Available Tools

1. parse_pdf

2. parse_image_text

3. html_to_markdown

4. extract_tables

5. summarize_document

💰 Pricing

Free Tier

Pro Tier - $9/month

Scale Tier - $29/month

🔐 Authentication & Payment

MCPize (Easiest)

Direct API Access

Crypto Micropayments

📊 Performance

🧪 Testing

🤝 Integration Examples

Claude Desktop

Cline VS Code Extension

Custom Applications

🔧 API Reference

🛟 Support

📝 License

🏗️ Built With

Document Parser

Multi-Format Document Parser MCP Server

🤖 Why This Exists

⚡ Key Features

🚀 Installation

Claude Desktop Configuration

Cline Configuration

Via npm

Via MCPize (One-click deployment)

🛠️ Available Tools

1. parse_pdf

2. parse_image_text

3. html_to_markdown

4. extract_tables

5. summarize_document

💰 Pricing

Free Tier

Pro Tier - $9/month

Scale Tier - $29/month

🔐 Authentication & Payment

MCPize (Easiest)

Direct API Access

Crypto Micropayments

📊 Performance

🧪 Testing

🤝 Integration Examples

Claude Desktop

Cline VS Code Extension

Custom Applications

🔧 API Reference

🛟 Support

📝 License

🏗️ Built With

Related Documents & Knowledge MCP Servers

Related Documents & Knowledge MCP Servers

1. `parse_pdf`

2. `parse_image_text`

3. `html_to_markdown`

4. `extract_tables`

5. `summarize_document`

1. `parse_pdf`

2. `parse_image_text`

3. `html_to_markdown`

4. `extract_tables`

5. `summarize_document`