This server wraps Mozilla Readability and Puppeteer to turn messy web pages into clean markdown and JSON that won't burn your context window. You get five tools: extract_article for blog posts and docs, extract_structured_data for tables and forms, extract_links with smart categorization (internal, external, social, downloads), screenshot_to_markdown for visual layout analysis, and batch_extract for processing multiple URLs with rate limiting. All responses include timing metrics and token counts. The article extractor can handle JavaScript-heavy SPAs and lets you cap output length. Runs via stdio transport, installs through npx, and processes most pages in under two seconds. Built for agents that need to read the web without choking on raw HTML.
claude mcp add --transport stdio io.github.agenson-horrowitz-web-content-extractor uvx web-content-extractor