Connects Claude to the Extracto API for structured web scraping. You pass a URL and a JSON schema describing the fields you want (title, price, tags, nested objects, arrays), and it returns validated, typed data with null for anything missing instead of hallucinations. Exposes four tools: synchronous extract for quick jobs under 90 seconds, async extract for heavy pages, plus get_job and list_jobs for polling. Requires an Extracto API key. Useful when you need reliable data extraction from web pages without writing parsers or fighting with LLM prompt engineering to get consistent JSON shapes.
EXTRACTO_API_KEY*secretYour Extracto API key from https://app.getextracto.dev/keys
EXTRACTO_BASE_URLOverride the API host (defaults to https://app.getextracto.dev)
EXTRACTO_TIMEOUT_MSPer-request timeout in milliseconds (default 90000)
Model Context Protocol server for Extracto. It gives Claude, Cursor, Claude Code, and any MCP client the ability to turn a URL plus a schema into validated, typed JSON — no prompt engineering, no HTML parsing, and no hallucinated fields (missing data comes back as null).
You need an Extracto API key. Get one at app.getextracto.dev/keys.
The server runs over stdio and is published to npm, so most clients just need this config block.
Edit claude_desktop_config.json (Settings → Developer → Edit Config):
{
"mcpServers": {
"extracto": {
"command": "npx",
"args": ["-y", "extracto-mcp"],
"env": { "EXTRACTO_API_KEY": "exa_live_your_key_here" }
}
}
}
Add to ~/.cursor/mcp.json (or the project .cursor/mcp.json) with the same block.
claude mcp add extracto -e EXTRACTO_API_KEY=exa_live_your_key_here -- npx -y extracto-mcp
Restart the client and ask it to extract something, e.g. "Use extracto to pull the title, language and star count from github.com/facebook/react."
| Tool | What it does |
|---|---|
extract | Synchronous extraction from a single URL (up to ~90s). Returns { data, meta }. |
extract_async | Submit an async job for heavy or anti-bot pages. Returns a job id immediately. |
get_job | Poll an async job for status and result. |
list_jobs | List your recent async jobs. |
schema argumentA schema is an object mapping field names to types. A type is:
"string", "number", "boolean", "array", "object"["string"], or [{ "title": "string" }]{ "author": { "name": "string" } }{
"title": "string",
"price": "number",
"tags": ["string"],
"reviews": [{ "user": "string", "stars": "number" }]
}
Only fields that are actually found on the page are returned; anything missing is null rather than guessed.
All configuration is via environment variables passed by your MCP client:
| Variable | Required | Description |
|---|---|---|
EXTRACTO_API_KEY | yes | Your key from app.getextracto.dev/keys. |
EXTRACTO_BASE_URL | no | Override the API host (defaults to https://app.getextracto.dev). |
EXTRACTO_TIMEOUT_MS | no | Per-request timeout in ms (default 90000). |
npm install
npm run dev # run from source with tsx
npm run typecheck
npm run build # bundle to dist/ with tsup
extracto — the official TypeScript/JavaScript SDK.MIT
com.mcparmory/google-sheets
domdomegg/google-sheets-mcp
henilcalagiya/google-sheets-mcp
cct15/war-dashboard-data
moooonad/mcp-google-sheets-full
io.github.br0ski777/csv-to-json