Huoshui Fetch

STDIOregistry active

Summary

This is a web scraping toolkit that turns messy HTML into structured data. You get 11 tools split across fetching (with custom headers and auth support), conversion (HTML to Markdown, JSON to Markdown, text extraction), and extraction (article content via readability, metadata, links, images, JSON-LD structured data). Runs over stdio, so it plugs into Claude Desktop or any MCP client. Useful when you need to pull clean content from web pages without writing your own scraper, whether that's grabbing article text, converting documentation to Markdown, or extracting SEO metadata. Built on Python 3.11+ and ships with automated publishing scripts for PyPI.

CodeRabbit

AI writes the code. CodeRabbit catches the slop.

Try For Free →

AI notepad for back-to-back meetings

Notes, actions and memory. Without a meeting bot. First month 100% off.

Download for free →

Keep your Mac awake

Keep your Mac awake while Claude Code and 40+ AI agents run. Sleeps when they're idle.

One time payment $9 →

Email for Agents: Free tier available

Give your AI agent a complete email layer—sending, inbound inboxes, and sandbox testing.

Get 4K emails/month free →

Context.dev

Integrate web data into your AI product. One API to scrape website & brand data.

Get API Key Now →

CodeScene MCP Server

Your agent targets a perfect 10 Code Health score. Deterministic. Every commit.

Try For Free →

Make your agent a DeFi expert

Agent, run crypto. Access onchain data & trade routes via 1inch.

Install now →

AppSignal

Monitor with ease. Code with confidence.

Start Free Trial →

CodeRabbit

AI writes the code. CodeRabbit catches the slop.

Try For Free →

AI notepad for back-to-back meetings

Notes, actions and memory. Without a meeting bot. First month 100% off.

Download for free →

Keep your Mac awake

Keep your Mac awake while Claude Code and 40+ AI agents run. Sleeps when they're idle.

One time payment $9 →

Email for Agents: Free tier available

Give your AI agent a complete email layer—sending, inbound inboxes, and sandbox testing.

Get 4K emails/month free →

Context.dev

Integrate web data into your AI product. One API to scrape website & brand data.

Get API Key Now →

CodeScene MCP Server

Your agent targets a perfect 10 Code Health score. Deterministic. Every commit.

Try For Free →

Make your agent a DeFi expert

Agent, run crypto. Access onchain data & trade routes via 1inch.

Install now →

AppSignal

Monitor with ease. Code with confidence.

Start Free Trial →

huoshui-fetch

A dedicated web content fetching and conversion MCP (Model Context Protocol) server that provides tools for fetching, converting, and extracting data from web pages.

Features

Fetching Tools

fetch_url: Fetch content from URLs with customizable timeout, redirect handling, and user-agent
fetch_with_headers: Fetch URLs with custom headers for authenticated requests

Conversion Tools

html_to_markdown_tool: Convert HTML to clean Markdown format
html_to_text_tool: Extract plain text from HTML
clean_html_tool: Remove scripts/styles and sanitize HTML
json_to_markdown_tool: Convert JSON data to readable Markdown

Extraction Tools

extract_article_tool: Extract main article content using readability
extract_links_tool: Extract all links with filtering options
extract_metadata_tool: Extract page metadata (title, description, OG tags)
extract_images_tool: Extract images with size filtering
extract_structured_data_tool: Extract JSON-LD and microdata

Installation

From MCP Registry (Recommended)

This server is available in the Model Context Protocol Registry. Install it using your MCP client.

mcp-name: io.github.huoshuiai42/huoshui-fetch

# Using uv (recommended)
uv sync

# Or install from GitHub
pip install git+https://github.com/yourusername/huoshui-fetch.git

Usage

Run with uvx (recommended for one-time use)

# From the repository
uvx --from . huoshui-fetch

# From GitHub (once published)
uvx --from git+https://github.com/yourusername/huoshui-fetch.git huoshui-fetch

Run directly

# Using uv
uv run python -m huoshui_fetch

# Or if installed
python -m huoshui_fetch

The server communicates via standard input/output, making it perfect for integration with Claude Desktop and other MCP-compatible clients.

Configuration for Claude Desktop

Add to your Claude Desktop configuration:

{
  "mcpServers": {
    "huoshui-fetch": {
      "command": "uvx",
      "args": ["--no-cache", "--from", ".", "huoshui-fetch"],
      "cwd": "/path/to/huoshui-fetch"
    }
  }
}

Or if installed from GitHub:

{
  "mcpServers": {
    "huoshui-fetch": {
      "command": "uvx",
      "args": [
        "--from",
        "git+https://github.com/yourusername/huoshui-fetch.git",
        "huoshui-fetch"
      ]
    }
  }
}

Example Usage

Once configured, you can use the tools in Claude Desktop:

// Fetch a webpage
fetch_url("https://example.com")

// Convert HTML to Markdown
html_to_markdown_tool("<h1>Hello</h1><p>World</p>")

// Extract article content
extract_article_tool(html_content, "https://example.com/article")

Requirements

Python 3.11+
Dependencies listed in pyproject.toml

Development & Publishing

This project includes comprehensive automation for building and publishing to PyPI.

Automated Publishing Workflow

# Complete automated workflow (TestPyPI + PyPI)
uv run python scripts/publish.py --include-pypi

# TestPyPI only (recommended for testing)
uv run python scripts/publish.py

# Bump version and publish
uv run python scripts/publish.py --version-bump patch --include-pypi

Individual Commands

# Version management
uv run python scripts/version_manager.py --check
uv run python scripts/version_manager.py --bump patch

# Setup PyPI credentials (first time)
uv run python scripts/credentials_setup.py

# Build package
uv run python scripts/build.py

# Run comprehensive tests
uv run python scripts/test.py

# Upload to PyPI
uv run python scripts/upload.py

Features

✅ Version Management: Automatic synchronization across all files
✅ Quality Checks: Ruff linting and MyPy type checking
✅ Build Automation: Clean builds with validation
✅ Testing Suite: Comprehensive package and functionality tests
✅ Publishing Workflow: TestPyPI → PyPI using uv publish (supports .pypirc files)
✅ Error Recovery: Built-in error handling and recovery options

See PUBLISHING.md for detailed documentation.

DXT Extension

This project supports DXT (Desktop Extensions) format for easy distribution and installation.

To build the DXT extension:

python build_dxt.py

This will create a huoshui-fetch-{version}.dxt file that can be installed in compatible AI desktop applications.

License

MIT

Featured

CodeRabbit

AI writes the code. CodeRabbit catches the slop.

Try For Free →

AI notepad for back-to-back meetings

Notes, actions and memory. Without a meeting bot. First month 100% off.

Download for free →

Keep your Mac awake

Keep your Mac awake while Claude Code and 40+ AI agents run. Sleeps when they're idle.

One time payment $9 →

Email for Agents: Free tier available

Give your AI agent a complete email layer—sending, inbound inboxes, and sandbox testing.

Get 4K emails/month free →

Context.dev

Integrate web data into your AI product. One API to scrape website & brand data.

Get API Key Now →

CodeScene MCP Server

Your agent targets a perfect 10 Code Health score. Deterministic. Every commit.

Try For Free →

Make your agent a DeFi expert

Agent, run crypto. Access onchain data & trade routes via 1inch.

Install now →

AppSignal

Monitor with ease. Code with confidence.

Start Free Trial →

huoshui-fetch

A dedicated web content fetching and conversion MCP (Model Context Protocol) server that provides tools for fetching, converting, and extracting data from web pages.

Features

Fetching Tools

fetch_url: Fetch content from URLs with customizable timeout, redirect handling, and user-agent
fetch_with_headers: Fetch URLs with custom headers for authenticated requests

Conversion Tools

html_to_markdown_tool: Convert HTML to clean Markdown format
html_to_text_tool: Extract plain text from HTML
clean_html_tool: Remove scripts/styles and sanitize HTML
json_to_markdown_tool: Convert JSON data to readable Markdown

Extraction Tools

extract_article_tool: Extract main article content using readability
extract_links_tool: Extract all links with filtering options
extract_metadata_tool: Extract page metadata (title, description, OG tags)
extract_images_tool: Extract images with size filtering
extract_structured_data_tool: Extract JSON-LD and microdata

Installation

From MCP Registry (Recommended)

This server is available in the Model Context Protocol Registry. Install it using your MCP client.

mcp-name: io.github.huoshuiai42/huoshui-fetch

# Using uv (recommended)
uv sync

# Or install from GitHub
pip install git+https://github.com/yourusername/huoshui-fetch.git

Usage

Run with uvx (recommended for one-time use)

# From the repository
uvx --from . huoshui-fetch

# From GitHub (once published)
uvx --from git+https://github.com/yourusername/huoshui-fetch.git huoshui-fetch

Run directly

# Using uv
uv run python -m huoshui_fetch

# Or if installed
python -m huoshui_fetch

The server communicates via standard input/output, making it perfect for integration with Claude Desktop and other MCP-compatible clients.

Configuration for Claude Desktop

Add to your Claude Desktop configuration:

{
  "mcpServers": {
    "huoshui-fetch": {
      "command": "uvx",
      "args": ["--no-cache", "--from", ".", "huoshui-fetch"],
      "cwd": "/path/to/huoshui-fetch"
    }
  }
}

Or if installed from GitHub:

{
  "mcpServers": {
    "huoshui-fetch": {
      "command": "uvx",
      "args": [
        "--from",
        "git+https://github.com/yourusername/huoshui-fetch.git",
        "huoshui-fetch"
      ]
    }
  }
}

Example Usage

Once configured, you can use the tools in Claude Desktop:

// Fetch a webpage
fetch_url("https://example.com")

// Convert HTML to Markdown
html_to_markdown_tool("<h1>Hello</h1><p>World</p>")

// Extract article content
extract_article_tool(html_content, "https://example.com/article")

Requirements

Python 3.11+
Dependencies listed in pyproject.toml

Development & Publishing

This project includes comprehensive automation for building and publishing to PyPI.

Automated Publishing Workflow

# Complete automated workflow (TestPyPI + PyPI)
uv run python scripts/publish.py --include-pypi

# TestPyPI only (recommended for testing)
uv run python scripts/publish.py

# Bump version and publish
uv run python scripts/publish.py --version-bump patch --include-pypi

Individual Commands

# Version management
uv run python scripts/version_manager.py --check
uv run python scripts/version_manager.py --bump patch

# Setup PyPI credentials (first time)
uv run python scripts/credentials_setup.py

# Build package
uv run python scripts/build.py

# Run comprehensive tests
uv run python scripts/test.py

# Upload to PyPI
uv run python scripts/upload.py

Features

✅ Version Management: Automatic synchronization across all files
✅ Quality Checks: Ruff linting and MyPy type checking
✅ Build Automation: Clean builds with validation
✅ Testing Suite: Comprehensive package and functionality tests
✅ Publishing Workflow: TestPyPI → PyPI using uv publish (supports .pypirc files)
✅ Error Recovery: Built-in error handling and recovery options

See PUBLISHING.md for detailed documentation.

DXT Extension

This project supports DXT (Desktop Extensions) format for easy distribution and installation.

To build the DXT extension:

python build_dxt.py

This will create a huoshui-fetch-{version}.dxt file that can be installed in compatible AI desktop applications.

License

MIT

Huoshui Fetch

huoshui-fetch

Features

Fetching Tools

Conversion Tools

Extraction Tools

Installation

Usage

Run with uvx (recommended for one-time use)

Run directly

Configuration for Claude Desktop

Example Usage

Requirements

Development & Publishing

Automated Publishing Workflow

Individual Commands

Features

DXT Extension

License

Huoshui Fetch

huoshui-fetch

Features

Fetching Tools

Conversion Tools

Extraction Tools

Installation

Usage

Run with uvx (recommended for one-time use)

Run directly

Configuration for Claude Desktop

Example Usage

Requirements

Development & Publishing

Automated Publishing Workflow

Individual Commands

Features

DXT Extension

License

Related Search & Web Crawling MCP Servers

Related Search & Web Crawling MCP Servers