Gives Claude direct access to arXiv's academic paper database through four tools: free-text search with field prefixes and boolean operators, batch metadata lookup by paper ID, full HTML content extraction with LaTeX math formatting, and category taxonomy browsing. Built on the author's mcp-ts-core framework with rate limiting that respects arXiv's 3-second crawl delay and adaptive backoff. Ships with an optional local SQLite mirror using arXiv's OAI-PMH feed to eliminate rate limit exposure on searches. Available as a public hosted instance at arxiv.caseyjhand.com or run locally via stdio. Particularly useful when you need to search recent papers, pull abstracts in bulk, or read full paper content without leaving your LLM workflow.
Search arXiv, fetch paper metadata, and read full-text content via MCP. STDIO or Streamable HTTP.
Public Hosted Server: https://arxiv.caseyjhand.com/mcp
Four tools for searching and reading arXiv papers:
| Tool Name | Description |
|---|---|
arxiv_search | Search arXiv papers by query with category and sort filters. |
arxiv_get_metadata | Get full metadata for one or more arXiv papers by ID. |
arxiv_read_paper | Fetch the full text content of an arXiv paper from its HTML rendering. |
arxiv_list_categories | List arXiv category taxonomy, optionally filtered by group. |
arxiv_searchSearch for papers using free-text queries with field prefixes and boolean operators.
ti: (title), au: (author), abs: (abstract), cat: (category), all: (all fields)AND, OR, ANDNOTarxiv_get_metadataFetch full metadata for one or more papers by known arXiv ID.
2401.12345v2) and unversioned (2401.12345) IDshep-th/9901001)arxiv_read_paperRead the full HTML content of an arXiv paper.
$…$ inline, $$…$$ block) so the character budget targets paper contentmax_characters defaults to 100,000; raw HTML can be 500KB-3MB+ for math-heavy papersarxiv_list_categoriesList arXiv category codes and names for discovery.
| URI Pattern | Description |
|---|---|
arxiv://paper/{paperId} | Paper metadata by arXiv ID. |
arxiv://categories | Full arXiv category taxonomy. |
Built on @cyanheads/mcp-ts-core:
none, jwt, oauth)arXiv-specific:
Retry-Afterarxiv_search and arxiv_get_metadata. See Optional: Local Mirror.A public instance is available at https://arxiv.caseyjhand.com/mcp — no installation required. Point any MCP client at it via Streamable HTTP:
{
"mcpServers": {
"arxiv-mcp-server": {
"type": "streamable-http",
"url": "https://arxiv.caseyjhand.com/mcp"
}
}
}
Add to your MCP client config (e.g., claude_desktop_config.json):
{
"mcpServers": {
"arxiv-mcp-server": {
"type": "stdio",
"command": "bunx",
"args": ["@cyanheads/arxiv-mcp-server@latest"]
}
}
}
git clone https://github.com/cyanheads/arxiv-mcp-server.git
cd arxiv-mcp-server
bun install
All configuration is optional — the server works out of the box with sensible defaults.
| Variable | Description | Default |
|---|---|---|
ARXIV_API_BASE_URL | arXiv API base URL. | https://export.arxiv.org/api |
ARXIV_REQUEST_DELAY_MS | Minimum delay between arXiv API requests (ms). | 3000 |
ARXIV_CONTENT_TIMEOUT_MS | Timeout for HTML content fetches (ms). | 30000 |
ARXIV_API_TIMEOUT_MS | Timeout for API search/metadata requests (ms). | 15000 |
ARXIV_MIRROR_ENABLED | Enable local OAI-PMH metadata mirror for search and metadata. | false |
ARXIV_MIRROR_PATH | SQLite path for the mirror. | ./data/arxiv-mirror.db |
ARXIV_MIRROR_REFRESH_CRON | UTC cron expression for in-process daily refresh (HTTP mode only). | unset |
ARXIV_MIRROR_FALLBACK_LIVE | Fall through to live API on local ID-lookup miss. | true |
ARXIV_MIRROR_RECENT_DAYS_LIVE | Route sortBy=submitted descending queries within this window to the live API. | 2 |
ARXIV_MIRROR_OAI_BASE_URL | arXiv OAI-PMH endpoint base URL. | https://oaipmh.arxiv.org/oai |
ARXIV_MIRROR_OAI_REQUEST_DELAY_MS | Minimum delay between OAI-PMH requests (ms). | 3000 |
ARXIV_MIRROR_REFRESH_TIMEOUT_MS | Abort budget for one scheduled refresh subprocess (ms). | 7200000 |
MCP_TRANSPORT_TYPE | Transport: stdio or http. | stdio |
MCP_HTTP_PORT | Port for HTTP server. | 3010 |
MCP_AUTH_MODE | Auth mode: none, jwt, or oauth. | none |
MCP_LOG_LEVEL | Log level (RFC 5424). | info |
Build and run:
bun run build
bun run start:http # or start:stdio
Run checks and tests:
bun run devcheck # Lint, format, typecheck, audit
bun run test # Vitest
For self-hosted deployments behind a single egress IP, arXiv's ~3-second per-IP crawl delay serializes concurrent users. An optional local mirror eliminates rate-limit exposure for arxiv_search and arxiv_get_metadata by serving from a SQLite + FTS5 store harvested via OAI-PMH. arxiv_read_paper continues to use the live API — full-content harvest is forbidden by arXiv's data policy.
Disabled by default. To enable:
# 1. Cold-start harvest (~4.4h sequential, resumable from checkpoint). One-time per installation.
bun run mirror:init
# 2. Enable the mirror.
export ARXIV_MIRROR_ENABLED=true
# 3. Start the server — reads switch to the mirror once the harvest completes.
bun run start:http
Daily incremental refresh (small delta; duration depends on arXiv's OAI-PMH page pacing) via:
bun run mirror:refresh # wire to cron / systemd timer / launchd, OR
# set ARXIV_MIRROR_REFRESH_CRON to schedule it in HTTP mode (spawned as a child process)
bun run mirror:verify # PRAGMA integrity_check + quick_check
Behavior notes. Ranking divergence: FTS5 BM25 differs from arXiv's internal ranking, so sortBy=relevance against the mirror returns a different top-K than the live API. Queries sorted by submitted descending within ARXIV_MIRROR_RECENT_DAYS_LIVE days route to the live API to cover the nightly-update gap. Refresh resilience: after the initial cold harvest completes, an in-progress or failed daily refresh keeps serving the existing dataset from the mirror — arxiv_search and arxiv_get_metadata don't drop to the live API during the refresh window (#21). The scheduled HTTP-mode refresh runs in a child process, so the harvest's synchronous SQLite writes never block the request event loop — search and metadata stay responsive throughout (#22). The mirror stores the latest version only; per-version reads continue to use the live API. See #12 for the full design.
docker build -t arxiv-mcp-server .
docker run -p 3010:3010 arxiv-mcp-server
| Directory | Purpose |
|---|---|
src/mcp-server/tools/definitions/ | Tool definitions (*.tool.ts). |
src/mcp-server/resources/definitions/ | Resource definitions (*.resource.ts). |
src/services/arxiv/ | ArxivService — live arXiv API client (search, metadata, HTML). |
src/services/arxiv/mirror/ | Optional OAI-PMH mirror — harvester, SQLite + FTS5 store, query translator, runner. |
src/config/ | Environment variable parsing and validation with Zod. |
scripts/arxiv-mirror-*.ts | Mirror lifecycle scripts (init, refresh, verify). |
tests/ | Unit and integration tests. |
docs/ | Design document and directory structure. |
See CLAUDE.md for development guidelines and architectural rules. The short version:
try/catch in tool logicctx.log for domain-specific loggingArxivService — don't add per-tool delaysIssues and pull requests are welcome. Run checks before submitting:
bun run devcheck
bun test
Apache-2.0 — see LICENSE for details.
io.github.pipeworx-io/brave-search
marcopesani/mcp-server-serper
brave/brave-search-mcp-server
com.mcparmory/google-search-console
acamolese/google-search-console-mcp
io.github.sarahpark/google-search-console