CCM
/MCP
SkillsMCPMarketplacesDigestLearnAdvertise

This week in Claude

Every Monday: Claude Code, Agent SDK, MCP, and the Anthropic platform moves worth your time.

Skills by Category
Frontend DevelopmentBackend & APIsTesting & QASecurityDevOps & CI/CDGit & Pull RequestsDocumentationCode Review & QualityAI & Agent BuildingSkill Development
MCP Servers by Category
Sales & MarketingWeb & Browser AutomationDatabasesAI & LLM ToolsCloud & InfrastructureCommunication & MessagingDeveloper ToolsDesign & CreativeDocuments & KnowledgeSearch & Web Crawling
Marketplaces by Category
AI Agents & OrchestrationLLM IntegrationDevelopment ToolsFrontend & UIBackend & APIsDatabasesTesting & Code QualityDevOps & CloudSecurity & ComplianceGit & Version Control

Claude Code Marketplaces

Discover Claude Code plugins, extensions, and tools. Automatically updated directory of Anthropic Claude AI marketplaces with development tools, productivity plugins, and integrations.

Resources

  • Browse Skills
  • Browse MCP Servers
  • Browse Marketplaces
  • Plugins Reference

Community

  • About
  • Learn
  • Feedback
  • Privacy Policy
  • Advertise

Built for the Claude Code community with Claude Code by @mertduzgun

Independent project, not affiliated with Anthropic

Internet Archive Mcp Server

cyanheads/internet-archive-mcp-server
STDIO, HTTPregistry active
Summary

Connects Claude to the Internet Archive's Wayback Machine and 40M+ item library through five tools covering snapshot discovery, content retrieval, metadata queries, and OCR text extraction. You can search the CDX API for capture histories with date and MIME filters, fetch archived page content stripped of banner injections, run filtered searches across books and media, pull complete file manifests with download URLs, and page through long OCR documents. Built on the public Availability, CDX, Solr, and Metadata APIs with no authentication required. Useful when you need to verify historical content, trace how pages changed over time, or retrieve public domain texts and documents programmatically.

CodeRabbit
CodeRabbit
AI writes the code. CodeRabbit catches the slop.
Try For Free →
Make your agent a DeFi expert
Make your agent a DeFi expert
Agent, run crypto. Access onchain data & trade routes via 1inch.
Install now →
AppSignal
AppSignal
Monitor with ease. Code with confidence.
Start Free Trial →
Make money from your Skills
Make money from your Skills
On Capafy, your Skill runs online 24/7 as an agent product, and you get paid every time someone uses it.
Start earning →
Put your SEO on autopilot
Put your SEO on autopilot
An agent that runs the SEO playbooks that move rankings and ships PRs you control.
Get founding access →
Vibe Prospecting MCPVibe Prospecting MCP
Vibe Prospecting MCP
Connect Claude to +800M contacts, +150M companies. Find & Enrich leads in chat.
Try For Free →
CodeRabbit
CodeRabbit
AI writes the code. CodeRabbit catches the slop.
Try For Free →
Make your agent a DeFi expert
Make your agent a DeFi expert
Agent, run crypto. Access onchain data & trade routes via 1inch.
Install now →
AppSignal
AppSignal
Monitor with ease. Code with confidence.
Start Free Trial →
Make money from your Skills
Make money from your Skills
On Capafy, your Skill runs online 24/7 as an agent product, and you get paid every time someone uses it.
Start earning →
Put your SEO on autopilot
Put your SEO on autopilot
An agent that runs the SEO playbooks that move rankings and ships PRs you control.
Get founding access →
Vibe Prospecting MCPVibe Prospecting MCP
Vibe Prospecting MCP
Connect Claude to +800M contacts, +150M companies. Find & Enrich leads in chat.
Try For Free →
Featured
CodeRabbit
CodeRabbit
AI writes the code. CodeRabbit catches the slop.
Try For Free →
Make your agent a DeFi expert
Make your agent a DeFi expert
Agent, run crypto. Access onchain data & trade routes via 1inch.
Install now →
AppSignal
AppSignal
Monitor with ease. Code with confidence.
Start Free Trial →
Make money from your Skills
Make money from your Skills
On Capafy, your Skill runs online 24/7 as an agent product, and you get paid every time someone uses it.
Start earning →
Put your SEO on autopilot
Put your SEO on autopilot
An agent that runs the SEO playbooks that move rankings and ships PRs you control.
Get founding access →
Vibe Prospecting MCPVibe Prospecting MCP
Vibe Prospecting MCP
Connect Claude to +800M contacts, +150M companies. Find & Enrich leads in chat.
Try For Free →

Configuration

MCP_LOG_LEVELdefault: info

Sets the minimum log level for output (e.g., 'debug', 'info', 'warn').

MCP_HTTP_HOSTdefault: 127.0.0.1

The hostname for the HTTP server.

MCP_HTTP_PORTdefault: 3010

The port to run the HTTP server on.

MCP_HTTP_ENDPOINT_PATHdefault: /mcp

The endpoint path for the MCP server.

MCP_AUTH_MODEdefault: none

Authentication mode to use: 'none', 'jwt', or 'oauth'.

Categories
Search & Web Crawling
Registryactive
Package@cyanheads/internet-archive-mcp-server
TransportSTDIO, HTTP
UpdatedJun 7, 2026
View on GitHub

@cyanheads/internet-archive-mcp-server

Search the Wayback Machine and IA library (40M+ items), fetch archived snapshots, retrieve item metadata and full text via MCP. STDIO or Streamable HTTP.

5 Tools • 1 Resource

Version License Docker MCP SDK npm TypeScript Bun

Install in Claude Desktop Install in Cursor Install in VS Code

Framework


Tools

Five tools covering two Internet Archive pillars — Wayback Machine snapshot discovery and retrieval, and IA library search and content access:

ToolDescription
ia_find_snapshotsFind Wayback Machine snapshots of a URL. Mode closest returns the nearest capture to a given timestamp. Mode history returns the full capture list via CDX with date range, status, and MIME filters, collapsed by default to one capture per day. Supports resume-key pagination for large histories.
ia_get_snapshotFetch the archived content of a URL at a specific Wayback timestamp. Strips HTML to readable text and returns the canonical replay URL.
ia_search_itemsSearch the IA library (40M+ items). Filter by media type, collection, creator, date range, and language. Sort by relevance, date, or downloads. Returns identifiers, titles, types, and pagination context (total_found, page, rows).
ia_get_itemRetrieve full metadata and the file manifest for an Archive item by identifier — title, creator, description, subjects, collections, license, and every file with its format, size, and direct download URL.
ia_get_textRetrieve readable OCR text (DjVuTXT or plain-text) from a text item. Length-aware truncation with continuation pointer (char_offset) for paging through large documents.

ia_find_snapshots

Discover what the Wayback Machine has captured for any URL.

  • closest mode: single fast lookup via the Availability API — returns the nearest capture to a given timestamp
  • history mode: full capture list via the CDX API, filterable by date range (from/to), HTTP status (status_filter), and MIME type
  • Default collapse of timestamp:8 (one capture per day) keeps responses tractable for popular URLs; adjust with the collapse parameter (timestamp:N, N=1–14)
  • Resume-key pagination (resume_key) for stepping through large CDX histories without re-scanning

ia_get_snapshot

Retrieve what a page actually said at a point in time.

  • Resolves to the nearest available capture when the exact timestamp has no snapshot
  • Strips Wayback banner injections and extracts readable text — returns clean content alongside the canonical replay URL for browser access
  • Useful for fact-checking, citation verification, and tracing how content changed over time

ia_search_items

Search across 40M+ Archive items by keyword and metadata filters.

  • Full-text Solr query syntax plus structured filters: mediatype (texts, audio, video, software, image), collection, creator, language, and date range
  • Sort by relevance, date added, or download count
  • Pagination via page and rows; output includes total_found and current page/rows so agents can paginate correctly without guessing

ia_get_item

Fetch the complete metadata and file manifest for any Archive item.

  • Returns structured fields: title, creator, description, subjects, collections, date, license, and more
  • files[] includes every file in the item with its format, size, and direct download URL — the primary way to act on a search result
  • metadata response {} on unknown identifier → typed item_not_found error

ia_get_text

Read the OCR text of public-domain books, documents, and transcripts.

  • Locates the best available text file in the item's manifest (DjVuTXT preferred, falls back to plain text)
  • max_chars and char_offset enable efficient paging through long documents without re-fetching
  • Surfaces download_forbidden (HTTP 403) as a typed error for restricted collections rather than failing silently

Resource

TypeNameDescription
Resourceia://item/{identifier}Metadata snapshot for an Archive item — title, creator, mediatype, description, subjects, collections, date, license, and file count. Stable URIs for injectable context.

All resource data is also reachable via ia_get_item. The resource provides a stable, injectable URI for referencing a specific item across workflows.

Features

Built on @cyanheads/mcp-ts-core:

  • Declarative tool, resource, and prompt definitions — single file per primitive, framework handles registration and validation
  • Unified error handling — handlers throw, framework catches, classifies, and formats
  • Pluggable auth: none, jwt, oauth
  • Swappable storage backends: in-memory, filesystem, Supabase, Cloudflare KV/R2/D1
  • Structured logging with optional OpenTelemetry tracing
  • STDIO and Streamable HTTP transports

Internet Archive-specific:

  • No credentials required — all four APIs are public
  • Three service layers: WaybackService (Availability + CDX), ArchiveSearchService (Solr), ArchiveMetadataService (Metadata + downloads)
  • CDX collapse-by-day default and configurable limit keep responses tractable for high-capture URLs
  • Identifies User-Agent on every request as required by IA's terms; configurable via IA_USER_AGENT

Agent-friendly output:

  • Pagination context on every list response — total_found, page, rows (search) and resume_key (CDX history) so agents never have to guess whether results are complete
  • Typed error reasons (no_snapshots, no_snapshot_available, item_not_found, no_text_file, download_forbidden) with recovery hints so callers can retry or explain to users without parsing text
  • Structured file manifests — every ia_get_item response includes file-level metadata (format, size, URL) enabling agents to select the right file without a follow-up call

Getting started

No API key required — the Internet Archive's APIs are fully public.

Add the following to your MCP client configuration file:

{
  "mcpServers": {
    "internet-archive-mcp-server": {
      "type": "stdio",
      "command": "bunx",
      "args": ["@cyanheads/internet-archive-mcp-server@latest"],
      "env": {
        "MCP_TRANSPORT_TYPE": "stdio",
        "MCP_LOG_LEVEL": "info"
      }
    }
  }
}

Or with npx (no Bun required):

{
  "mcpServers": {
    "internet-archive-mcp-server": {
      "type": "stdio",
      "command": "npx",
      "args": ["-y", "@cyanheads/internet-archive-mcp-server@latest"],
      "env": {
        "MCP_TRANSPORT_TYPE": "stdio",
        "MCP_LOG_LEVEL": "info"
      }
    }
  }
}

Or with Docker:

{
  "mcpServers": {
    "internet-archive-mcp-server": {
      "type": "stdio",
      "command": "docker",
      "args": [
        "run", "-i", "--rm",
        "-e", "MCP_TRANSPORT_TYPE=stdio",
        "ghcr.io/cyanheads/internet-archive-mcp-server:latest"
      ]
    }
  }
}

For Streamable HTTP, set the transport and start the server:

MCP_TRANSPORT_TYPE=http MCP_HTTP_PORT=3010 bun run start:http
# Server listens at http://localhost:3010/mcp

Prerequisites

  • Bun v1.3.2 or higher (or Node.js v24+).
  • No external accounts or API keys required.

Installation

  1. Clone the repository:
git clone https://github.com/cyanheads/internet-archive-mcp-server.git
  1. Navigate into the directory:
cd internet-archive-mcp-server
  1. Install dependencies:
bun install
  1. Configure environment:
cp .env.example .env
# Optional: edit .env for custom User-Agent, timeouts, etc.

Configuration

All configuration is validated at startup via Zod schemas in src/config/server-config.ts.

VariableDescriptionDefault
MCP_TRANSPORT_TYPETransport: stdio or httpstdio
MCP_HTTP_PORTHTTP server port3010
MCP_AUTH_MODEAuth mode: none, jwt, or oauthnone
MCP_LOG_LEVELLog level (debug, info, notice, warning, error)info
LOGS_DIRDirectory for log files (Node.js only)<project-root>/logs
STORAGE_PROVIDER_TYPEStorage backendin-memory
OTEL_ENABLEDEnable OpenTelemetry instrumentationfalse
IA_USER_AGENTCustom User-Agent for IA API requestsinternet-archive-mcp-server/{version} (github.com/cyanheads/internet-archive-mcp-server)
IA_REQUEST_TIMEOUT_MSHTTP request timeout in milliseconds30000
IA_MAX_SNAPSHOT_CHARSDefault character cap for ia_get_text responses50000

See .env.example for the full list of optional overrides.

Running the server

Local development

  • Build and run:

    # One-time build
    bun run rebuild
    
    # Run the built server
    bun run start:stdio
    # or
    bun run start:http
    
  • Run checks and tests:

    bun run devcheck   # Lint, format, typecheck, security
    bun run test       # Vitest test suite
    bun run lint:mcp   # Validate MCP definitions against spec
    

Docker

docker build -t internet-archive-mcp-server .
docker run --rm -p 3010:3010 internet-archive-mcp-server

The Dockerfile defaults to HTTP transport, stateless session mode, and logs to /var/log/internet-archive-mcp-server. OpenTelemetry peer dependencies are installed by default — build with --build-arg OTEL_ENABLED=false to omit them.

Project structure

DirectoryPurpose
src/index.tscreateApp() entry point — registers tools, resource, and inits services.
src/configServer-specific environment variable parsing and validation with Zod.
src/mcp-server/toolsTool definitions (*.tool.ts). Five tools across Wayback and IA library.
src/mcp-server/resourcesResource definitions. ia://item/{identifier} item metadata resource.
src/services/waybackWaybackService — Availability API + CDX API client.
src/services/archive-searchArchiveSearchService — Solr Advanced Search client.
src/services/archive-metadataArchiveMetadataService — Metadata API + file download client.
tests/Unit and integration tests mirroring src/.

Development guide

See CLAUDE.md for development guidelines and architectural rules. The short version:

  • Handlers throw, framework catches — no try/catch in tool logic
  • Use ctx.log for request-scoped logging, ctx.state for tenant-scoped storage
  • Register new tools and resources via the barrels in src/mcp-server/*/definitions/index.ts
  • Wrap external API calls: validate raw → normalize to domain type → return output schema; never fabricate missing fields

Contributing

Issues and pull requests are welcome. Run checks and tests before submitting:

bun run devcheck
bun run test

License

Apache-2.0 — see LICENSE for details.

Related Search & Web Crawling MCP Servers

View all →
Brave Search

io.github.pipeworx-io/brave-search

Brave Search MCP — independent web index (no Google/Bing dependency)
Serper Search and Scrape

marcopesani/mcp-server-serper

Serper MCP Server supporting search and webpage scraping
154
Brave Search Mcp Server

brave/brave-search-mcp-server

Brave Search MCP Server: web results, images, videos, rich results, AI summaries, and more.
1.2k
Google Search Console

com.mcparmory/google-search-console

Query search analytics, manage sitemaps, and inspect site URLs and status
25
Google Search Console

acamolese/google-search-console-mcp

Google Search Console MCP server: SEO audits, performance queries, URL inspection, indexing checks.
3
Google Search Console

io.github.sarahpark/google-search-console

Google Search Console MCP server — search analytics, URL inspection, and sitemaps
2