Three academic search engines behind one MCP interface. You get arXiv with full-text retrieval in markdown, HTML, or raw LaTeX source; Semantic Scholar's citation graph, author lookup, and recommendations; and OpenAlex's 316M cross-discipline catalog with institution and topic filters. The unified search_all tool fans out to all three, deduplicates by DOI, and re-ranks with reciprocal rank fusion. Bonus: image to LaTeX OCR for turning formula or table screenshots back into source code, backed by DeepSeek-OCR and texify models. Reach for this when you need programmatic paper discovery plus the actual manuscript text, not just metadata. Hosted at latex-tools.online or run locally with FastMCP.
Remotely-callable MCP server for academic paper search, full-text retrieval & image→LaTeX, served at https://latex-tools.online/mcp.
Three corpora behind one normalized interface:
arxiv (default) — search, metadata, and full-text (HTML / markdown / LaTeX source)semanticscholar (alias s2) — the full S2 API surface: citation graph, authors, recommendations, full-text snippets, bulk datasetsopenalex (alias oa) — 316M all-field works: citation graph, authors with h-index, institutions, topics, influence metricsPlus a unified search_all that fuses all three corpora, and image→LaTeX OCR tools.
| Tool | Purpose |
|---|---|
search_all(query, max_results=10, sources='arxiv,semanticscholar,openalex') | Unified search. Fans out to all three corpora concurrently, de-duplicates the same work (by DOI/title) and re-ranks with Reciprocal Rank Fusion. Each hit carries sources (who found it) + an ids map for follow-up calls. Prefer this for broad lookups. |
search_papers(query, source='arxiv', max_results=10, sort_by='relevance') | Single-corpus search. arXiv query accepts plain text or field syntax (ti: au: cat:cs.CL abs: + AND/OR). |
get_paper(paper_id, source='arxiv') | One paper's full record. S2 id accepts S2 id / DOI: / ARXIV: / CorpusId:. |
search_by_author(author, source='arxiv') | Papers by author, newest first. |
list_recent(category, source='arxiv') | Latest in a category (arXiv code or S2 field of study). |
list_categories(source='arxiv') | Common category codes. |
read_paper(paper_id, format='markdown') | FULL text (arXiv). markdown = body with formulas as $LaTeX$; html = raw LaTeXML page; latex = original manuscript .tex source. |
list_paper_sources() | Available corpora. |
read_paper fetch chain: arxiv.org/html/{id} → ar5iv fallback (markdown/html), or arxiv.org/e-print/{id} tarball main .tex (latex). Formulas are recovered from the LaTeXML alttext invariant.
Turn a formula or table image back into LaTeX (e.g. a figure cropped from a paper) without needing your own vision model. Backed by the co-located recognize service (PaddleOCR-VL / DeepSeek-OCR / texify).
| Tool | Purpose |
|---|---|
recognize_formula(image_url=... or image_base64=..., model='deepseek-ocr') | Formula image → LaTeX. image_url is downloaded server-side (with SSRF guards). Returns {latex, model, elapsed_ms}. |
recognize_table(image_url=... or image_base64=..., model='deepseek-ocr') | Table image → LaTeX tabular. |
list_ocr_models() | Available OCR models (deepseek-ocr, paddleocr-vl, texify). |
get_openalex_work · get_openalex_citations · get_openalex_references · search_openalex_works (filters: year range, open-access, min-citations, institution)search_openalex_authors · search_openalex_institutionsget_openalex_trends · list_openalex_topicsget_paper_citations · get_paper_references · get_paper_authorsmatch_paper_title · autocomplete_paperssearch_papers_bulk (≤1000, sortable, token paging) · get_papers_batchsearch_authors · get_author · get_author_papers · get_authors_batchsearch_snippets (search inside paper body)recommend_papers_for_paper · recommend_papers_from_exampleslist_dataset_releases · get_dataset_release · get_dataset_download_links · get_dataset_diffspaper_mcp/
server.py FastMCP server (tool registrations + instructions)
models.py normalized Paper model
aggregate.py cross-source fusion (dedup + Reciprocal Rank Fusion)
sources/
base.py source registry (get_source / list_sources)
arxiv.py arXiv Atom API + read_paper (HTML/markdown/latex)
semanticscholar.py Semantic Scholar full API surface
openalex.py OpenAlex REST API (works/authors/institutions/topics)
recognize.py image→LaTeX client over the co-located recognize service
pyproject.toml
cd paper-mcp
python -m venv .venv && . .venv/bin/activate
pip install -e .
PAPER_MCP_PORT=9400 python -m paper_mcp.server
# MCP endpoint at http://127.0.0.1:9400/mcp (JSON-RPC; a plain GET returns 406)
| Var | Default | Notes |
|---|---|---|
PAPER_MCP_HOST | 127.0.0.1 | |
PAPER_MCP_PORT | 9400 | |
PAPER_MCP_PATH | /mcp | |
SEMANTIC_SCHOLAR_API_KEY | — | optional; raises S2 rate limit. Set via /etc/paper-mcp.env in prod. |
paper-mcp.service on the latex-tools server, WorkingDirectory /opt/paper-mcp, port 9400.https://latex-tools.online/mcp → 127.0.0.1:9400/mcp./etc/paper-mcp.env (SEMANTIC_SCHOLAR_API_KEY).../deploy/ in this repo.This repo is the source of truth. The server runs an independent copy under /opt/paper-mcp (not auto-synced):
# edit here → push → deploy
scp -r paper_mcp/* latex-tools:/opt/paper-mcp/paper_mcp/
ssh latex-tools 'systemctl restart paper-mcp'
ssh latex-tools 'curl -s -o /dev/null -w "%{http_code}\n" http://127.0.0.1:9400/mcp' # 406 = healthy (needs JSON-RPC handshake)
_USER_AGENT, backoff).read_paper covers ~80%+ of papers via official HTML; older scan-only papers may have no full text.docs repo on 2026-06-07; that copy is gone.io.github.pipeworx-io/brave-search
marcopesani/mcp-server-serper
brave/brave-search-mcp-server
com.mcparmory/google-search-console
acamolese/google-search-console-mcp
io.github.sarahpark/google-search-console