Webscrape Mcp

HTTPregistry active

Summary

Connects your LLM to the web with three straightforward tools: webscrape_search queries DuckDuckGo and returns scraped results as Markdown, webscrape_fetch_url grabs a single page with optional Readability mode to strip nav and ads, and webscrape_batch_fetch handles up to five URLs in parallel. PDF detection is automatic, so URLs ending in .pdf get text extracted page by page via PyMuPDF. Built-in 200-entry cache for repeated requests. Reach for this when you need clean, LLM-ready content from arbitrary URLs without managing BeautifulSoup pipelines yourself. Hosted on Render for quick testing or run locally with the Python source.

CodeRabbit

AI writes the code. CodeRabbit catches the slop.

Try For Free →

Give your AI the whole web as clean markdown

Integrate web data into your AI product. One API to scrape website & brand data.

Get API Key Now →

belt - the only tool your agent needs

belt cli automatically finds the best tools and skills for your agent. image, video, music, tts...

one prompt install →

Email for Agents: Free tier available

Give your AI agent a complete email layer—sending, inbound inboxes, and sandbox testing.

Get 4K emails/month free →

Make your agent a DeFi expert

Agent, run crypto. Access onchain data & trade routes via 1inch.

Install now →

AppSignal

Monitor with ease. Code with confidence.

Start Free Trial →

AI notepad for back-to-back meetings

Notes, actions and memory. Without a meeting bot. First month 100% off.

Download for free →

CodeScene MCP Server

Your agent targets a perfect 10 Code Health score. Deterministic. Every commit.

Try For Free →

CodeRabbit

AI writes the code. CodeRabbit catches the slop.

Try For Free →

Give your AI the whole web as clean markdown

Integrate web data into your AI product. One API to scrape website & brand data.

Get API Key Now →

belt - the only tool your agent needs

belt cli automatically finds the best tools and skills for your agent. image, video, music, tts...

one prompt install →

Email for Agents: Free tier available

Give your AI agent a complete email layer—sending, inbound inboxes, and sandbox testing.

Get 4K emails/month free →

Make your agent a DeFi expert

Agent, run crypto. Access onchain data & trade routes via 1inch.

Install now →

AppSignal

Monitor with ease. Code with confidence.

Start Free Trial →

AI notepad for back-to-back meetings

Notes, actions and memory. Without a meeting bot. First month 100% off.

Download for free →

CodeScene MCP Server

Your agent targets a perfect 10 Code Health score. Deterministic. Every commit.

Try For Free →

WebScrape MCP Server

English · Español

English

MCP server that lets AI agents search the web and extract clean Markdown content — no ads, no clutter, just the text your LLM needs.

What's New in v2.0.0

New features:

JS rendering — Render JavaScript-heavy sites with Playwright (Chromium headless). Auto-fallback when httpx gets 403 or empty content
Structured data extraction — Extract JSON-LD, Open Graph, Twitter Cards, meta tags, canonical URLs, and hreflang links with extract_schema=True
Screenshots — New webscrape_screenshot tool captures page screenshots with configurable viewport, full-page mode, and PNG/JPEG format
Multi-engine search — DuckDuckGo primary, automatic fallback to Google and Bing if DDGS is unavailable
Smart truncation — Content is truncated at paragraph/sentence boundaries instead of mid-word

Improvements:

Enhanced cache: 500 entries with 15-minute TTL (was 200, no expiry)
Better error handling with specific messages for 403, 404, 429, timeouts
Updated Dockerfile with Chromium dependencies for Playwright

Tools

Tool	Description
`webscrape_fetch_url`	Fetch a single URL and return clean Markdown. Supports `use_readability`, `js_render`, `extract_schema`, and auto-detects PDFs
`webscrape_batch_fetch`	Fetch up to 5 URLs in parallel. Supports PDF auto-detection, JS rendering, and structured data
`webscrape_search`	Search the web (DuckDuckGo → Google → Bing fallback) and scrape results into Markdown
`webscrape_screenshot`	Capture a screenshot of any web page with headless Chromium. Supports PNG/JPEG, viewport sizing, and full-page capture

Features

PDF support: URLs ending in .pdf or with application/pdf content-type are auto-detected and text is extracted page by page
Readability mode: Pass use_readability=True to webscrape_fetch_url for cleaner article extraction using Mozilla Readability (removes nav, sidebars, ads, comments)
JS rendering: Pass js_render=True to render JavaScript-heavy sites with Playwright (headless Chromium). Auto-fallback when httpx gets 403 or empty content
Structured data extraction: Pass extract_schema=True to extract JSON-LD, Open Graph, Twitter Cards, meta tags, canonical URLs, and hreflang links
Multi-engine search: DuckDuckGo primary, automatic fallback to Google and Bing if DDGS is unavailable
Screenshots: Capture page screenshots with configurable viewport, full-page mode, and PNG/JPEG format
Built-in cache: 500-entry cache with TTL-based eviction (15 min) for repeated URLs
Batch fetching: Up to 5 URLs in parallel
Smart truncation: Content is truncated at paragraph/sentence boundaries instead of mid-word

How to use

Option 1 — MCPize (recommended)

Go to https://mcpize.com/marketplace
Search Web Scrape and click Start Free
You'll get an API key
Configure in your AI client:

{
  "mcpServers": {
    "webscrape": {
      "url": "https://webscrape.mcpize.run",
      "headers": {
        "Authorization": "Bearer your-api-key"
      }
    }
  }
}

Option 2 — Render (dev)

{
  "mcpServers": {
    "webscrape": {
      "url": "https://webscrape-mcp.onrender.com"
    }
  }
}

Option 3 — Local

git clone https://github.com/carrasquelalex1/webscrape-mcp.git
cd webscrape-mcp
pip install -r requirements.txt
playwright install chromium
python webscrape_mcp.py

Official Registry

io.github.carrasquelalex1/webscrape-mcp

Dependencies

mcp, httpx, beautifulsoup4, markdownify, pydantic, ddgs, readability-lxml, PyMuPDF, playwright

License

MIT

Español

Servidor MCP que permite a agentes de IA buscar en la web y extraer contenido limpio en Markdown — sin anuncios, sin navegación, solo el texto que tu LLM necesita.

Novedades en v2.0.0

Nuevas características:

Rendering JS — Renderiza sitios con JavaScript usando Playwright (Chromium headless). Fallback automático cuando httpx recibe 403 o contenido vacío
Extracción de datos estructurados — Extrae JSON-LD, Open Graph, Twitter Cards, meta tags, URLs canónicas y links hreflang con extract_schema=True
Capturas de pantalla — Nueva herramienta webscrape_screenshot que captura screenshots con viewport configurable, modo full-page y formato PNG/JPEG
Búsqueda multi-motor — DuckDuckGo primario, fallback automático a Google y Bing si DDGS no está disponible
Truncado inteligente — El contenido se trunca en límites de párrafo/oración en vez de cortar palabras a la mitad

Mejoras:

Caché mejorada: 500 entradas con TTL de 15 minutos (antes 200, sin expiración)
Mejor manejo de errores con mensajes específicos para 403, 404, 429, timeouts
Dockerfile actualizado con dependencias de Chromium para Playwright

Tools

Tool	Descripción
`webscrape_fetch_url`	Obtiene una URL y la convierte a Markdown limpio. Soporta `use_readability`, `js_render`, `extract_schema`, y detecta PDFs automáticamente
`webscrape_batch_fetch`	Obtiene hasta 5 URLs en paralelo. Soporta detección de PDFs, rendering JS, y datos estructurados
`webscrape_search`	Busca en la web (DuckDuckGo → Google → Bing como fallback) y extrae los resultados a Markdown
`webscrape_screenshot`	Captura una captura de pantalla de cualquier página web con Chromium headless. Soporta PNG/JPEG, tamaño de viewport, y captura completa

Características

Soporte PDF: URLs que terminan en .pdf o con content-type application/pdf se detectan automáticamente y se extrae el texto página por página
Modo Readability: Usá use_readability=True en webscrape_fetch_url para extraer artículos de forma más limpia (elimina navegación, barras laterales, anuncios, comentarios)
Rendering JS: Usá js_render=True para renderizar sitios con JavaScript usando Playwright (Chromium headless). Fallback automático cuando httpx recibe 403 o contenido vacío
Extracción de datos estructurados: Usá extract_schema=True para extraer JSON-LD, Open Graph, Twitter Cards, meta tags, URLs canónicas, y links hreflang
Búsqueda multi-motor: DuckDuckGo primario, fallback automático a Google y Bing si DDGS no está disponible
Capturas de pantalla: Captura de páginas con viewport configurable, modo full-page, y formato PNG/JPEG
Caché integrada: 500 entradas con evicción por TTL (15 min) para URLs repetidas
Batch fetching: Hasta 5 URLs en paralelo
Truncado inteligente: El contenido se trunca en límites de párrafo/oración en vez de cortar palabras a la mitad

Cómo usarlo

Opción 1 — MCPize (recomendada)

Ve a https://mcpize.com/marketplace
Busca Web Scrape y haz clic en Start Free
Obtendrás una API key
Configura en tu cliente de IA:

{
  "mcpServers": {
    "webscrape": {
      "url": "https://webscrape.mcpize.run",
      "headers": {
        "Authorization": "Bearer tu-api-key"
      }
    }
  }
}

Opción 2 — Render (desarrollo)

{
  "mcpServers": {
    "webscrape": {
      "url": "https://webscrape-mcp.onrender.com"
    }
  }
}

Opción 3 — Local

git clone https://github.com/carrasquelalex1/webscrape-mcp.git
cd webscrape-mcp
pip install -r requirements.txt
playwright install chromium
python webscrape_mcp.py

Registro Oficial

io.github.carrasquelalex1/webscrape-mcp

Dependencias

mcp, httpx, beautifulsoup4, markdownify, pydantic, ddgs, readability-lxml, PyMuPDF, playwright

Licencia

MIT

Featured

CodeRabbit

AI writes the code. CodeRabbit catches the slop.

Try For Free →

Give your AI the whole web as clean markdown

Integrate web data into your AI product. One API to scrape website & brand data.

Get API Key Now →

belt - the only tool your agent needs

belt cli automatically finds the best tools and skills for your agent. image, video, music, tts...

one prompt install →

Email for Agents: Free tier available

Give your AI agent a complete email layer—sending, inbound inboxes, and sandbox testing.

Get 4K emails/month free →

Make your agent a DeFi expert

Agent, run crypto. Access onchain data & trade routes via 1inch.

Install now →

AppSignal

Monitor with ease. Code with confidence.

Start Free Trial →

AI notepad for back-to-back meetings

Notes, actions and memory. Without a meeting bot. First month 100% off.

Download for free →

CodeScene MCP Server

Your agent targets a perfect 10 Code Health score. Deterministic. Every commit.

Try For Free →

WebScrape MCP Server

English · Español

English

MCP server that lets AI agents search the web and extract clean Markdown content — no ads, no clutter, just the text your LLM needs.

What's New in v2.0.0

New features:

JS rendering — Render JavaScript-heavy sites with Playwright (Chromium headless). Auto-fallback when httpx gets 403 or empty content
Structured data extraction — Extract JSON-LD, Open Graph, Twitter Cards, meta tags, canonical URLs, and hreflang links with extract_schema=True
Screenshots — New webscrape_screenshot tool captures page screenshots with configurable viewport, full-page mode, and PNG/JPEG format
Multi-engine search — DuckDuckGo primary, automatic fallback to Google and Bing if DDGS is unavailable
Smart truncation — Content is truncated at paragraph/sentence boundaries instead of mid-word

Improvements:

Enhanced cache: 500 entries with 15-minute TTL (was 200, no expiry)
Better error handling with specific messages for 403, 404, 429, timeouts
Updated Dockerfile with Chromium dependencies for Playwright

Tools

Tool	Description
`webscrape_fetch_url`	Fetch a single URL and return clean Markdown. Supports `use_readability`, `js_render`, `extract_schema`, and auto-detects PDFs
`webscrape_batch_fetch`	Fetch up to 5 URLs in parallel. Supports PDF auto-detection, JS rendering, and structured data
`webscrape_search`	Search the web (DuckDuckGo → Google → Bing fallback) and scrape results into Markdown
`webscrape_screenshot`	Capture a screenshot of any web page with headless Chromium. Supports PNG/JPEG, viewport sizing, and full-page capture

Features

PDF support: URLs ending in .pdf or with application/pdf content-type are auto-detected and text is extracted page by page
Readability mode: Pass use_readability=True to webscrape_fetch_url for cleaner article extraction using Mozilla Readability (removes nav, sidebars, ads, comments)
JS rendering: Pass js_render=True to render JavaScript-heavy sites with Playwright (headless Chromium). Auto-fallback when httpx gets 403 or empty content
Structured data extraction: Pass extract_schema=True to extract JSON-LD, Open Graph, Twitter Cards, meta tags, canonical URLs, and hreflang links
Multi-engine search: DuckDuckGo primary, automatic fallback to Google and Bing if DDGS is unavailable
Screenshots: Capture page screenshots with configurable viewport, full-page mode, and PNG/JPEG format
Built-in cache: 500-entry cache with TTL-based eviction (15 min) for repeated URLs
Batch fetching: Up to 5 URLs in parallel
Smart truncation: Content is truncated at paragraph/sentence boundaries instead of mid-word

How to use

Option 1 — MCPize (recommended)

Go to https://mcpize.com/marketplace
Search Web Scrape and click Start Free
You'll get an API key
Configure in your AI client:

{
  "mcpServers": {
    "webscrape": {
      "url": "https://webscrape.mcpize.run",
      "headers": {
        "Authorization": "Bearer your-api-key"
      }
    }
  }
}

Option 2 — Render (dev)

{
  "mcpServers": {
    "webscrape": {
      "url": "https://webscrape-mcp.onrender.com"
    }
  }
}

Option 3 — Local

git clone https://github.com/carrasquelalex1/webscrape-mcp.git
cd webscrape-mcp
pip install -r requirements.txt
playwright install chromium
python webscrape_mcp.py

Official Registry

io.github.carrasquelalex1/webscrape-mcp

Dependencies

mcp, httpx, beautifulsoup4, markdownify, pydantic, ddgs, readability-lxml, PyMuPDF, playwright

License

MIT

Español

Servidor MCP que permite a agentes de IA buscar en la web y extraer contenido limpio en Markdown — sin anuncios, sin navegación, solo el texto que tu LLM necesita.

Novedades en v2.0.0

Nuevas características:

Rendering JS — Renderiza sitios con JavaScript usando Playwright (Chromium headless). Fallback automático cuando httpx recibe 403 o contenido vacío
Extracción de datos estructurados — Extrae JSON-LD, Open Graph, Twitter Cards, meta tags, URLs canónicas y links hreflang con extract_schema=True
Capturas de pantalla — Nueva herramienta webscrape_screenshot que captura screenshots con viewport configurable, modo full-page y formato PNG/JPEG
Búsqueda multi-motor — DuckDuckGo primario, fallback automático a Google y Bing si DDGS no está disponible
Truncado inteligente — El contenido se trunca en límites de párrafo/oración en vez de cortar palabras a la mitad

Mejoras:

Caché mejorada: 500 entradas con TTL de 15 minutos (antes 200, sin expiración)
Mejor manejo de errores con mensajes específicos para 403, 404, 429, timeouts
Dockerfile actualizado con dependencias de Chromium para Playwright

Tools

Tool	Descripción
`webscrape_fetch_url`	Obtiene una URL y la convierte a Markdown limpio. Soporta `use_readability`, `js_render`, `extract_schema`, y detecta PDFs automáticamente
`webscrape_batch_fetch`	Obtiene hasta 5 URLs en paralelo. Soporta detección de PDFs, rendering JS, y datos estructurados
`webscrape_search`	Busca en la web (DuckDuckGo → Google → Bing como fallback) y extrae los resultados a Markdown
`webscrape_screenshot`	Captura una captura de pantalla de cualquier página web con Chromium headless. Soporta PNG/JPEG, tamaño de viewport, y captura completa

Características

Soporte PDF: URLs que terminan en .pdf o con content-type application/pdf se detectan automáticamente y se extrae el texto página por página
Modo Readability: Usá use_readability=True en webscrape_fetch_url para extraer artículos de forma más limpia (elimina navegación, barras laterales, anuncios, comentarios)
Rendering JS: Usá js_render=True para renderizar sitios con JavaScript usando Playwright (Chromium headless). Fallback automático cuando httpx recibe 403 o contenido vacío
Extracción de datos estructurados: Usá extract_schema=True para extraer JSON-LD, Open Graph, Twitter Cards, meta tags, URLs canónicas, y links hreflang
Búsqueda multi-motor: DuckDuckGo primario, fallback automático a Google y Bing si DDGS no está disponible
Capturas de pantalla: Captura de páginas con viewport configurable, modo full-page, y formato PNG/JPEG
Caché integrada: 500 entradas con evicción por TTL (15 min) para URLs repetidas
Batch fetching: Hasta 5 URLs en paralelo
Truncado inteligente: El contenido se trunca en límites de párrafo/oración en vez de cortar palabras a la mitad

Cómo usarlo

Opción 1 — MCPize (recomendada)

Ve a https://mcpize.com/marketplace
Busca Web Scrape y haz clic en Start Free
Obtendrás una API key
Configura en tu cliente de IA:

{
  "mcpServers": {
    "webscrape": {
      "url": "https://webscrape.mcpize.run",
      "headers": {
        "Authorization": "Bearer tu-api-key"
      }
    }
  }
}

Opción 2 — Render (desarrollo)

{
  "mcpServers": {
    "webscrape": {
      "url": "https://webscrape-mcp.onrender.com"
    }
  }
}

Opción 3 — Local

git clone https://github.com/carrasquelalex1/webscrape-mcp.git
cd webscrape-mcp
pip install -r requirements.txt
playwright install chromium
python webscrape_mcp.py

Registro Oficial

io.github.carrasquelalex1/webscrape-mcp

Dependencias

mcp, httpx, beautifulsoup4, markdownify, pydantic, ddgs, readability-lxml, PyMuPDF, playwright

Licencia

MIT

Webscrape Mcp

WebScrape MCP Server

English

What's New in v2.0.0

Tools

Features

How to use

Option 1 — MCPize (recommended)

Option 2 — Render (dev)

Option 3 — Local

Official Registry

Dependencies

License

Español

Novedades en v2.0.0

Tools

Características

Cómo usarlo

Opción 1 — MCPize (recomendada)

Opción 2 — Render (desarrollo)

Opción 3 — Local

Registro Oficial

Dependencias

Licencia

Webscrape Mcp

WebScrape MCP Server

English

What's New in v2.0.0

Tools

Features

How to use

Option 1 — MCPize (recommended)

Option 2 — Render (dev)

Option 3 — Local

Official Registry

Dependencies

License

Español

Novedades en v2.0.0

Tools

Características

Cómo usarlo

Opción 1 — MCPize (recomendada)

Opción 2 — Render (desarrollo)

Opción 3 — Local

Registro Oficial

Dependencias

Licencia

Related Documents & Knowledge MCP Servers

Related Documents & Knowledge MCP Servers