Connects your LLM to the web with three straightforward tools: webscrape_search queries DuckDuckGo and returns scraped results as Markdown, webscrape_fetch_url grabs a single page with optional Readability mode to strip nav and ads, and webscrape_batch_fetch handles up to five URLs in parallel. PDF detection is automatic, so URLs ending in .pdf get text extracted page by page via PyMuPDF. Built-in 200-entry cache for repeated requests. Reach for this when you need clean, LLM-ready content from arbitrary URLs without managing BeautifulSoup pipelines yourself. Hosted on Render for quick testing or run locally with the Python source.
English · Español
MCP server that lets AI agents search the web and extract clean Markdown content — no ads, no clutter, just the text your LLM needs.
| Tool | Description |
|---|---|
webscrape_search | Search the web (DuckDuckGo) and scrape results into Markdown |
webscrape_fetch_url | Fetch a single URL and return clean Markdown. Supports use_readability and auto-detects PDFs |
webscrape_batch_fetch | Fetch up to 5 URLs in parallel. Supports PDF auto-detection |
PDF support: URLs ending in .pdf or with application/pdf content-type are auto-detected and text is extracted page by page
Readability mode: Pass use_readability=True to webscrape_fetch_url for cleaner article extraction using Mozilla Readability (removes nav, sidebars, ads, comments)
DuckDuckGo search: No API key required, just a search query
Built-in cache: 200-entry cache with automatic eviction for repeated URLs
Batch fetching: Up to 5 URLs in parallel
{
"mcpServers": {
"webscrape": {
"url": "https://webscrape.mcpize.run",
"headers": {
"Authorization": "Bearer your-api-key"
}
}
}
}
{
"mcpServers": {
"webscrape": {
"url": "https://webscrape-mcp.onrender.com"
}
}
}
git clone https://github.com/carrasquelalex1/webscrape-mcp.git
cd webscrape-mcp
pip install -r requirements.txt
python webscrape_mcp.py
io.github.carrasquelalex1/webscrape-mcp
mcp, httpx, beautifulsoup4, markdownify, pydantic, ddgs, readability-lxml, PyMuPDF
MIT
Servidor MCP que permite a agentes de IA buscar en la web y extraer contenido limpio en Markdown — sin anuncios, sin navegación, solo el texto que tu LLM necesita.
| Tool | Descripción |
|---|---|
webscrape_search | Busca en la web (DuckDuckGo) y extrae los resultados a Markdown |
webscrape_fetch_url | Obtiene una URL y la convierte a Markdown limpio. Soporta use_readability y detecta PDFs automáticamente |
webscrape_batch_fetch | Obtiene hasta 5 URLs en paralelo. Soporta detección automática de PDFs |
Soporte PDF: URLs que terminan en .pdf o con content-type application/pdf se detectan automáticamente y se extrae el texto página por página
Modo Readability: Usá use_readability=True en webscrape_fetch_url para extraer artículos de forma más limpia (elimina navegación, barras laterales, anuncios, comentarios)
Búsqueda DuckDuckGo: Sin necesidad de API key
Caché integrada: 200 entradas con evicción automática para URLs repetidas
Batch fetching: Hasta 5 URLs en paralelo
{
"mcpServers": {
"webscrape": {
"url": "https://webscrape.mcpize.run",
"headers": {
"Authorization": "Bearer tu-api-key"
}
}
}
}
{
"mcpServers": {
"webscrape": {
"url": "https://webscrape-mcp.onrender.com"
}
}
}
git clone https://github.com/carrasquelalex1/webscrape-mcp.git
cd webscrape-mcp
pip install -r requirements.txt
python webscrape_mcp.py
io.github.carrasquelalex1/webscrape-mcp
mcp, httpx, beautifulsoup4, markdownify, pydantic, ddgs, readability-lxml, PyMuPDF, playwright
MIT
xt765/mcp-document-converter
suekou/mcp-notion-server
meterlong/mcp-doc
n24q02m/better-notion-mcp
io.github.misterwigglesworth/easy-notion
combjellyshen/zoterobridge