Consult7 is an MCP server that enables AI agents to offload analysis of large file collections to high-capacity language models via OpenRouter, supporting context windows up to 2M tokens. It collects files from specified paths, assembles them into a single context, sends them to a selected model (such as Google's Gemini 3.1 family or other frontier models) with a user-provided query, and returns the results back to the agent. This solves the problem of agents running out of context when analyzing large codebases, document repositories, or mixed content that exceed their current limits, while also enabling access to specialized model capabilities and comparative analysis across different models.
Consult7 is a Model Context Protocol (MCP) server that enables AI agents to consult large context window models via OpenRouter for analyzing extensive file collections - entire codebases, document repositories, or mixed content that exceed the current agent's context limits.
Consult7 enables any MCP-compatible agent to offload file analysis to large context models (up to 2M tokens). Useful when:
"For Claude Code users, Consult7 is a game changer."
Consult7 collects files from the specific paths you provide (with optional wildcards in filenames), assembles them into a single context, and sends them to a large context window model along with your query. The result is directly fed back to the agent you are working with.
["/Users/john/project/src/*.py", "/Users/john/project/lib/*.py"]"google/gemini-3-flash-preview""fast"["/Users/john/webapp/src/*.py", "/Users/john/webapp/auth/*.py", "/Users/john/webapp/api/*.js"]"anthropic/claude-opus-4.8""think"["/Users/john/project/src/*.py", "/Users/john/project/tests/*.py"]"google/gemini-2.5-pro""think""/Users/john/reports/code_review.md""Result has been saved to /Users/john/reports/code_review.md" instead of flooding the agent's contextConsult7 supports Google's Gemini 3.1 family:
google/gemini-3.1-pro-preview) - Flagship reasoning model, 1M contextgoogle/gemini-3-flash-preview) - Ultra-fast model, 1M contextgoogle/gemini-3.1-flash-lite-preview) - Ultra-fast lite model, 1M contextQuick mnemonics for power users:
gemt = Gemini 3.1 Pro + think (flagship reasoning)gemf = Gemini 3 Flash + fast (ultra fast)gptt = GPT-5.5 + think (latest GPT)grot = Grok 4.20 + think (automatic reasoning)oput = Claude Opus 4.8 + think (adaptive thinking)ULTRA = Run GEMT, GPTT, GROT, and OPUT in parallel (4 frontier models)FUSE = Fusion: a frontier panel deliberates and a judge synthesizes, in one callThese mnemonics make it easy to reference model+mode combinations in your queries.
Consult7 supports OpenRouter's Fusion (openrouter/fusion) — a single call where a panel of frontier models (Opus, GPT, Gemini Pro) answers your query in parallel and a judge model synthesizes their responses into one answer. Reach for it on hard questions where multiple perspectives help and the cost of being wrong outweighs a few extra completions.
fast / mid / think map the panel's web-search/fetch budget to max_tool_calls of 2 / 8 / 16.FUSE = openrouter/fusion.Trivial prompts answer directly (no panel); the panel fires only when the question warrants deliberation. Fusion is billed per panel run, so it costs more than a single-model call.
Simply run:
claude mcp add -s user consult7 uvx -- consult7 your-openrouter-api-key
Add to your Claude Desktop configuration file:
{
"mcpServers": {
"consult7": {
"type": "stdio",
"command": "uvx",
"args": ["consult7", "your-openrouter-api-key"]
}
}
}
Replace your-openrouter-api-key with your actual OpenRouter API key.
No installation required - uvx automatically downloads and runs consult7 in an isolated environment.
uvx consult7 <api-key> [--test]
<api-key>: Required. Your OpenRouter API key--test: Optional. Test the API connectionThe model and mode are specified when calling the tool, not at startup.
Consult7 supports all 500+ models available on OpenRouter. Below are the flagship models with optimized dynamic file size limits:
| Model | Context | Use Case |
|---|---|---|
openai/gpt-5.5 | 1M | Latest GPT, balanced performance |
google/gemini-3.1-pro-preview | 1M | Flagship reasoning model |
google/gemini-3-flash-preview | 1M | Gemini 3 Flash, ultra fast |
google/gemini-3.1-flash-lite-preview | 1M | Ultra-fast lite model |
anthropic/claude-opus-4.8 | 1M | Best quality, adaptive thinking |
anthropic/claude-sonnet-4.6 | 1M | Excellent reasoning, fast |
anthropic/claude-haiku-4.5 | 200k | Budget, very fast |
x-ai/grok-4.20 | 2M | Automatic reasoning, huge context |
x-ai/grok-4.1-fast | 2M | Largest context window |
openrouter/fusion | 128k | Multi-model panel + judge (see Featured: Fusion) |
Quick mnemonics:
gptt = openai/gpt-5.5 + think (latest GPT, deep reasoning)gemt = google/gemini-3.1-pro-preview + think (Gemini 3.1 Pro, flagship reasoning)grot = x-ai/grok-4.20 + think (Grok 4.20, automatic reasoning)oput = anthropic/claude-opus-4.8 + think (Claude Opus, adaptive thinking)opuf = anthropic/claude-opus-4.8 + fast (Claude Opus, no reasoning)gemf = google/gemini-3-flash-preview + fast (Gemini 3 Flash, ultra fast)ULTRA = call GEMT, GPTT, GROT, and OPUT IN PARALLEL (4 frontier models for maximum insight)FUSE = openrouter/fusion (one call: a frontier panel deliberates, a judge synthesizes; mode sets web-research depth)You can use any OpenRouter model ID (e.g., deepseek/deepseek-r1-0528). See the full model list. File size limits are automatically calculated based on each model's context window.
fast: No reasoning - quick answers, simple tasksmid: Moderate reasoning - code reviews, bug analysisthink: Maximum reasoning - security audits, complex refactoring/Users/john/project/src/*.py/Users/john/project/*.py (not in directory paths)*.py not *["/path/src/*.py", "/path/README.md", "/path/tests/*_test.py"]Common patterns:
/path/to/dir/*.py/path/to/tests/*_test.py or /path/to/tests/test_*.py["/path/*.js", "/path/*.ts"]Automatically ignored: __pycache__, .env, secrets.py, .DS_Store, .git, node_modules
Size limits: Dynamic based on model context window (e.g., Grok 4.20: ~8MB, GPT-5.5: ~4MB)
The consultation tool accepts the following parameters:
fast, mid, or think_updated suffix (e.g., report.md → report_updated.md)"Result has been saved to /path/to/file"false)
true, routes only to endpoints with ZDR policy (prompts not retained by provider)Claude Code will automatically use the tool with proper parameters:
{
"files": ["/Users/john/project/src/*.py"],
"query": "Explain the main architecture",
"model": "google/gemini-3-flash-preview",
"mode": "fast"
}
from consult7.consultation import consultation_impl
result = await consultation_impl(
files=["/path/to/file.py"],
query="Explain this code",
model="google/gemini-3-flash-preview",
mode="fast", # fast, mid, or think
provider="openrouter",
api_key="sk-or-v1-..."
)
# Test OpenRouter connection
uvx consult7 sk-or-v1-your-api-key --test
To remove consult7 from Claude Code:
claude mcp remove consult7 -s user
openrouter/fusion) — a multi-model panel plus a judge in one call; mode maps to web-research depth (fast/mid/think → max_tool_calls 2/8/16). New FUSE mnemonic.oput/opuf now point to 4.8, and 4.7 is kept as a legacy ID.cost: $0.0923.mid vs think for adaptive models (Opus, Grok)output_file return now includes the metadata footer so callers can verify what ranreasoning.enabled=truereasoning.enabled=truegptt → GPT-5.5, oput/opuf → Claude Opus 4.7, grot → Grok 4.20gemt → Gemini 3.1 Pro, oput/opuf → Claude Opus 4.6google/gemini-3-flash-preview (Gemini 3 Flash, ultra fast)gemf mnemonic to use Gemini 3 Flashzdr parameter for Zero Data Retention routinggoogle/gemini-3-pro-preview (1M context, flagship reasoning model)gemt (Gemini 3 Pro), grot (Grok 4), ULTRA (parallel execution)|thinking suffix - use mode parameter instead (now required)mode parameter API: fast, mid, thinkconsult7 <provider> <key> to consult7 <key>output_file parameter to save responses to filesMIT