Nexus aggregates multiple MCP servers, APIs, and LLM providers (OpenAI, Anthropic, Google, AWS Bedrock) through a single unified endpoint, enabling tool discovery via context-aware fuzzy search and routing requests across the entire AI stack. The server supports multiple MCP protocols (STDIO, SSE, HTTP), provides security features including CORS, CSRF protection, OAuth2, and TLS, and includes rate limiting with in-memory or Redis backends. Nexus solves the problem of managing disparate AI infrastructure by consolidating governance, control, and orchestration into one TOML-configured router available as a binary, Docker container, or buildable from source.
Plug in all your MCP servers, APIs, and LLM providers. Route everything through a unified endpoint.
Aggregate, govern, and control your AI stack.
curl -fsSL https://nexusrouter.com/install | bash
Pull the latest image:
docker pull ghcr.io/grafbase/nexus:latest
Or use the stable version:
docker pull ghcr.io/grafbase/nexus:stable
Or use a specific version:
docker pull ghcr.io/grafbase/nexus:X.Y.Z
git clone https://github.com/grafbase/nexus
cd nexus
cargo build --release
nexus
docker run -p 8000:8000 -v /path/to/config:/etc/nexus.toml ghcr.io/grafbase/nexus:latest
services:
nexus:
image: ghcr.io/grafbase/nexus:latest
ports:
- "8000:8000"
volumes:
- ./nexus.toml:/etc/nexus.toml
environment:
- GITHUB_TOKEN=${GITHUB_TOKEN}
healthcheck:
test: ["CMD", "curl", "-f", "http://localhost:8000/health"]
interval: 30s
timeout: 10s
retries: 3
Create a nexus.toml file to configure Nexus:
# LLM Provider configuration
[llm.providers.openai]
type = "openai"
api_key = "{{ env.OPENAI_API_KEY }}"
forward_token = true
# Model configuration (at least one model required per provider)
[llm.providers.openai.models.gpt-4]
[llm.providers.openai.models.gpt-3-5-turbo]
[llm.providers.anthropic]
type = "anthropic"
api_key = "{{ env.ANTHROPIC_API_KEY }}"
[llm.providers.anthropic.models.claude-3-5-sonnet-20241022]
# MCP Server configuration
[mcp.servers.github]
url = "https://api.githubcopilot.com/mcp/"
auth.token = "{{ env.GITHUB_TOKEN }}"
[mcp.servers.filesystem]
cmd = ["npx", "-y", "@modelcontextprotocol/server-filesystem", "/home/YOUR_USERNAME/Desktop"]
[mcp.servers.python_server]
cmd = ["python", "-m", "mcp_server"]
env = { PYTHONPATH = "/opt/mcp" }
cwd = "/workspace"
server.listen_address: The address and port Nexus will listen on (default: 127.0.0.1:8000)server.health.enabled: Enable health endpoint (default: true)server.health.path: Health check endpoint path (default: /health)llm.enabled: Enable LLM functionality (default: true)llm.protocols.openai.enabled: Enable OpenAI protocol endpoint (default: true)llm.protocols.openai.path: OpenAI endpoint path (default: /llm/openai)llm.protocols.anthropic.enabled: Enable Anthropic protocol endpoint (default: false)llm.protocols.anthropic.path: Anthropic endpoint path (default: /llm/anthropic)For detailed LLM provider configuration, see the LLM Provider Configuration section below.
mcp.enabled: Enable MCP functionality (default: true)mcp.path: MCP endpoint path (default: /mcp)mcp.enable_structured_content: Control MCP search tool response format (default: true)
true: Uses modern structuredContent field for better performance and type safetyfalse: Uses legacy content field with Content::json objects for compatibility with older MCP clientsSTDIO Servers: Launch local processes that communicate via standard input/output
[mcp.servers.my_tool]
cmd = ["path/to/executable", "--arg1", "--arg2"]
# Optional: Set environment variables
env = { DEBUG = "1", API_KEY = "{{ env.MY_API_KEY }}" }
# Optional: Set working directory
cwd = "/path/to/working/directory"
# Optional: Configure stderr handling (default: "null")
stderr = "inherit" # Show in console
# or
stderr = { file = "/var/log/mcp/server.log" } # Log to file
Note: STDIO servers must output valid JSON-RPC messages on stdout. The cmd array must have at least one element (the executable).
SSE Servers: Connect to Server-Sent Events endpoints
[mcp.servers.my_sse_server]
protocol = "sse"
url = "http://example.com/sse"
message_url = "http://example.com/messages" # Optional
HTTP Servers: Connect to streamable HTTP endpoints
[mcp.servers.my_http_server]
protocol = "streamable-http"
url = "https://api.example.com/mcp"
For remote MCP servers, if you omit the protocol Nexus will first try streamable HTTP and then SSE.
Add service token authentication to any server:
[mcp.servers.my_server.auth]
token = "your-token-here"
# Or use environment variables
token = "{{ env.MY_API_TOKEN }}"
If you enable OAuth2 authentication to your server, and your downstream servers all use the same authentication server, you can configure Nexus to forward the request access token to the downstream server.
[mcp.servers.my_server.auth]
type = "forward"
Nexus supports inserting static headers when making requests to MCP servers. Headers can be configured globally (for all MCP servers) or per-server.
Note: MCP currently only supports header insertion with static values. Headers from incoming requests are not forwarded.
Configure headers that apply to all MCP servers:
# Global headers for all MCP servers
[[mcp.headers]]
rule = "insert"
name = "X-Application"
value = "nexus-router"
[[mcp.headers]]
rule = "insert"
name = "X-API-Version"
value = "v1"
Configure headers for individual HTTP-based MCP servers:
[mcp.servers.my_server]
url = "https://api.example.com/mcp"
# Insert headers for this specific server
[[mcp.servers.my_server.headers]]
rule = "insert"
name = "X-API-Key"
value = "{{ env.MY_API_KEY }}" # Environment variable substitution
[[mcp.servers.my_server.headers]]
rule = "insert"
name = "X-Service-Name"
value = "my-service"
{{ env.VAR_NAME }} syntax for environment variable substitutioninsert rule is supported for MCPRestrict access to MCP servers and tools based on user groups:
# Server-level access control
[mcp.servers.premium_tools]
cmd = ["premium-server"]
allow = ["premium", "enterprise"] # Only these groups can access
deny = ["suspended"] # Block specific groups
# Tool-level override (more specific than server-level)
[mcp.servers.premium_tools.tools.expensive_feature]
allow = ["enterprise"] # Only enterprise can use this tool
[mcp.servers.premium_tools.tools.deprecated_tool]
allow = [] # Empty allow list blocks all access (no client ID needed)
Access control rules:
allow is set, only listed groups can access (requires client identification)deny is set, listed groups are blocked (requires client identification)allow = [] blocks all access without requiring client identificationConfigure OAuth2 authentication to protect your Nexus endpoints:
[server.oauth]
url = "https://your-oauth-provider.com/.well-known/jwks.json"
poll_interval = "5m"
expected_issuer = "https://your-oauth-provider.com"
expected_audience = "your-service-audience"
[server.oauth.protected_resource]
resource = "https://your-nexus-instance.com"
authorization_servers = ["https://your-oauth-provider.com"]
OAuth2 configuration options:
url: JWKs endpoint URL for token validationpoll_interval: How often to refresh JWKs (optional, default: no polling)expected_issuer: Expected iss claim in JWT tokens (optional)expected_audience: Expected aud claim in JWT tokens (optional)protected_resource.resource: URL of this protected resourceprotected_resource.authorization_servers: List of authorization server URLsWhen OAuth2 is enabled, all endpoints except /health and /.well-known/oauth-protected-resource require valid JWT tokens in the Authorization: Bearer <token> header.
Nexus supports rate limiting to prevent abuse and ensure fair resource usage:
# Global rate limiting configuration
[server.rate_limits]
enabled = true
# Storage backend configuration
[server.rate_limits.storage]
type = "memory" # or "redis" for distributed rate limiting
# For Redis backend:
# url = "redis://localhost:6379"
# key_prefix = "nexus:rate_limit:"
# Global rate limit (applies to all requests)
[server.rate_limits.global]
limit = 1000
interval = "60s"
# Per-IP rate limit
[server.rate_limits.per_ip]
limit = 100
interval = "60s"
# Per-MCP server rate limits
[mcp.servers.my_server.rate_limits]
limit = 50
interval = "60s"
# Tool-specific rate limits (override server defaults)
[mcp.servers.my_server.rate_limits.tools]
expensive_tool = { limit = 10, interval = "60s" }
cheap_tool = { limit = 100, interval = "60s" }
Rate Limiting Features:
Redis Backend Configuration:
[server.rate_limits.storage]
type = "redis"
url = "redis://localhost:6379"
key_prefix = "nexus:rate_limit:"
response_timeout = "1s"
connection_timeout = "5s"
# Connection pool settings
[server.rate_limits.storage.pool]
max_size = 16
min_idle = 0
timeout_create = "5s"
timeout_wait = "5s"
timeout_recycle = "300s"
# TLS configuration for Redis (optional)
[server.rate_limits.storage.tls]
enabled = true
ca_cert_path = "/path/to/ca.crt"
client_cert_path = "/path/to/client.crt" # For mutual TLS
client_key_path = "/path/to/client.key"
# insecure = true # WARNING: Only for development/testing, skips certificate validation
Note: When configuring tool-specific rate limits, Nexus will warn if you reference tools that don't exist.
Nexus provides token-based rate limiting for LLM providers to help control costs and prevent abuse. Unlike request-based rate limits, token rate limits count an estimate of actual tokens consumed.
IMPORTANT: LLM rate limiting requires client identification to be enabled:
[server.client_identification]
enabled = true
# Choose identification methods (at least one required)
client_id.jwt_claim = "sub" # Extract ID from JWT 'sub' claim
# or
client_id.http_header = "X-Client-ID" # Extract ID from HTTP header
# Optional: Limit groups per user (at most one allowed)
group_id.jwt_claim = "groups" # JWT claim containing user's group
# or
group_id.http_header = "X-Group-ID" # Extract ID from HTTP header
# You must provide a list of allowed groups
[server.client_identification.validation]
group_values = ["free", "pro", "max"]
Without client identification, rate limits cannot be enforced and requests will fail with a configuration error.
Token rate limits can be configured at four levels, from most to least specific:
The most specific applicable limit is always used.
# Provider-level default rate limit (applies to all models)
[llm.providers.openai.rate_limits.per_user]
input_token_limit = 100000 # 100K input tokens
interval = "1m" # Per minute
# Model-specific rate limit (overrides provider default)
[llm.providers.openai.models.gpt-4.rate_limits.per_user]
input_token_limit = 50000 # More restrictive for expensive model
interval = "30s"
Configure different limits for user groups (requires group_id and group_values in client identification):
# Provider-level group limits
[llm.providers.openai.rate_limits.per_user.groups]
free = { input_token_limit = 10000, interval = "60s" }
pro = { input_token_limit = 100000, interval = "60s" }
enterprise = { input_token_limit = 1000000, interval = "60s" }
# Model-specific group limits (override provider groups)
[llm.providers.openai.models.gpt-4.rate_limits.per_user.groups]
free = { input_token_limit = 5000, interval = "60s" }
pro = { input_token_limit = 50000, interval = "60s" }
enterprise = { input_token_limit = 500000, interval = "60s" }
The limits are per user, but you can define different limits if the user is part of a specific group. If the user does not belong to any group, they will be assigned to the per-user limits.
# Client identification (REQUIRED for rate limiting)
[server.client_identification]
enabled = true
client_id.jwt_claim = "sub"
group_id.jwt_claim = "subscription_tier"
[server.client_identification.validation]
group_values = ["free", "pro", "enterprise"]
# OpenAI provider with rate limiting
[llm.providers.openai]
type = "openai"
api_key = "{{ env.OPENAI_API_KEY }}"
# Provider-level defaults
[llm.providers.openai.rate_limits.per_user]
input_token_limit = 100000
interval = "60s"
[llm.providers.openai.rate_limits.per_user.groups]
free = { input_token_limit = 10000, interval = "60s" }
pro = { input_token_limit = 100000, interval = "60s" }
# GPT-4 specific limits (more restrictive)
[llm.providers.openai.models.gpt-4]
[llm.providers.openai.models.gpt-4.rate_limits.per_user]
input_token_limit = 50000
interval = "60s"
[llm.providers.openai.models.gpt-4.rate_limits.per_user.groups]
free = { input_token_limit = 5000, interval = "60s" }
pro = { input_token_limit = 50000, interval = "60s" }
# GPT-3.5 uses provider defaults
[llm.providers.openai.models.gpt-3-5-turbo]
max_tokens parameter are NOT considered in rate limit calculationsNote: The rate limiting is designed to be predictable and based only on what the client sends, not on variable output sizes.
When rate limited, the server returns a 429 status code. No Retry-After headers are sent to maintain consistency with downstream LLM provider behavior.
When rate limits are exceeded:
{
"error": {
"message": "Rate limit exceeded: Token rate limit exceeded. Please try again later.",
"type": "rate_limit_error",
"code": 429
}
}
Configure TLS for downstream connections:
[mcp.servers.my_server.tls]
verify_certs = true
accept_invalid_hostnames = false
root_ca_cert_path = "/path/to/ca.pem"
client_cert_path = "/path/to/client.pem"
client_key_path = "/path/to/client.key"
Nexus provides a unified interface for multiple LLM providers, allowing you to route chat completions through various services with a consistent API.
[llm]
enabled = true # Enable LLM functionality (default: true)
# OpenAI protocol endpoint configuration
[llm.protocols.openai]
enabled = true # Enable OpenAI protocol (default: true)
path = "/llm" # Custom path (default: "/llm/openai")
# Anthropic protocol endpoint configuration
[llm.protocols.anthropic]
enabled = true # Enable Anthropic protocol (default: false)
path = "/claude" # Custom path (default: "/llm/anthropic")
Nexus currently supports four major LLM providers with full tool calling capabilities:
Configure one or more LLM providers in your nexus.toml:
[llm.providers.openai]
type = "openai"
api_key = "{{ env.OPENAI_API_KEY }}"
# Optional: Use a custom base URL (for Azure OpenAI, proxies, or compatible APIs)
base_url = "https://api.openai.com/v1" # Default
# Model Configuration (REQUIRED - at least one model must be configured)
[llm.providers.openai.models.gpt-4]
# Optional: Rename the model for your users
# rename = "smart-model" # Users will see "openai/smart-model"
[llm.providers.openai.models.gpt-3-5-turbo]
# Models without rename use their original ID
[llm.providers.anthropic]
type = "anthropic"
api_key = "{{ env.ANTHROPIC_API_KEY }}"
# Optional: Use a custom base URL
base_url = "https://api.anthropic.com/v1" # Default
# Model Configuration (REQUIRED - at least one model must be configured)
[llm.providers.anthropic.models.claude-3-opus-20240229]
[llm.providers.anthropic.models.claude-3-5-sonnet-20241022]
[llm.providers.google]
type = "google"
api_key = "{{ env.GOOGLE_API_KEY }}"
# Optional: Use a custom base URL
base_url = "https://generativelanguage.googleapis.com/v1beta" # Default
# Model Configuration (REQUIRED - at least one model must be configured)
# Note: Model IDs with dots must be quoted in TOML
[llm.providers.google.models."gemini-1.5-flash"]
[llm.providers.google.models.gemini-pro]
[llm.providers.bedrock]
type = "bedrock"
# Optional: AWS profile to use (defaults to environment settings)
profile = "{{ env.AWS_PROFILE }}"
# Optional: AWS region (defaults to environment or us-east-1)
region = "us-west-2"
# Model Configuration (REQUIRED - at least one model must be configured)
# Bedrock uses model IDs with dots, so they must be quoted
[llm.providers.bedrock.models."anthropic.claude-3-5-sonnet-20241022-v2:0"]
[llm.providers.bedrock.models."anthropic.claude-3-opus-20240229-v1:0"]
[llm.providers.bedrock.models."amazon.nova-micro-v1:0"]
[llm.providers.bedrock.models."meta.llama3-8b-instruct-v1:0"]
[llm.providers.bedrock.models."ai21.jamba-1.5-mini-v1:0"]
# Rename models for simpler access
[llm.providers.bedrock.models.claude-haiku]
rename = "anthropic.claude-3-5-haiku-20241022-v1:0" # Users will access as "bedrock/claude-haiku"
[llm.providers.bedrock.models.jamba-mini]
rename = "ai21.jamba-1.5-mini-v1:0" # Users will access as "bedrock/jamba-mini"
AWS Bedrock provides access to multiple foundation models through a single API. Key features:
Authentication: Bedrock uses standard AWS credential chain:
AWS_ACCESS_KEY_ID, AWS_SECRET_ACCESS_KEY)AWS_PROFILE environment variable)Supported Model Families:
Model ID Format: Bedrock model IDs follow the pattern provider.model-name-version:revision, for example:
ai21.jamba-1.5-mini-v1:0anthropic.claude-3-5-sonnet-20241022-v2:0amazon.nova-micro-v1:0meta.llama3-8b-instruct-v1:0Nexus automatically discovers models from every provider. At startup the server performs an initial discovery; a background task then refreshes provider inventories every five minutes and publishes the results through a watch channel for lock-free lookups.
Simply configure a provider—no model list is required. Nexus fetches the provider's models and exposes them as bare model names:
[llm.providers.openai]
type = "openai"
api_key = "{{ env.OPENAI_API_KEY }}"
Discovered models appear in /v1/models (e.g., gpt-4, gpt-4o-mini). Requests using these bare names route automatically.
model_filterUse model_filter to limit which discovered models are exposed. The regex is case-insensitive, must not be empty, and follows the regex crate syntax.
[llm.providers.openai]
type = "openai"
api_key = "{{ env.OPENAI_API_KEY }}"
model_filter = "^gpt-4" # Only expose GPT-4-family models from discovery
[llm.providers.anthropic]
type = "anthropic"
api_key = "{{ env.ANTHROPIC_API_KEY }}"
model_filter = "^claude-"
If discovery fails for any provider at startup, Nexus exits so you can fix the configuration. Refresh failures log errors but keep the last successful snapshot.
Explicit models remain useful for renames, rate limits, and legacy prefixes. They always appear with provider/model even when a filter would exclude them.
[llm.providers.openai]
type = "openai"
api_key = "{{ env.OPENAI_API_KEY }}"
model_filter = "^gpt-4"
[llm.providers.openai.models.gpt-3-5-turbo]
rename = "fast-model"
[llm.providers.openai.models.gpt-4]
[llm.providers.openai.models.gpt-4.rate_limits.per_user]
input_token_limit = 50000
interval = "60s"
Discovery guarantees:
/v1/models lists bare discovered models first, followed by explicit provider/model entriesHow model resolution works:
/, route to the specified provider (openai/gpt-4)gpt-4 → provider)Accessing models:
gpt-4, claude-3-opus)provider/model format (e.g., openai/fast-model)You can rename models to provide custom identifiers for your users:
[llm.providers.openai.models.gpt-4]
rename = "smart-model" # Users will access this as "openai/smart-model"
[llm.providers.openai.models.gpt-3-5-turbo]
rename = "fast-model" # Users will access this as "openai/fast-model"
This is useful for:
Model IDs that contain dots must be quoted in TOML:
# Correct - dots in model IDs require quotes
[llm.providers.google.models."gemini-1.5-flash"]
[llm.providers.google.models."gemini-1.5-pro"]
# Also correct - no dots, no quotes needed
[llm.providers.google.models.gemini-pro]
You can configure multiple instances of the same provider type with different names:
# Primary OpenAI account
[llm.providers.openai_primary]
type = "openai"
api_key = "{{ env.OPENAI_PRIMARY_KEY }}"
[llm.providers.openai_primary.models.gpt-4]
[llm.providers.openai_primary.models.gpt-3-5-turbo]
# Secondary OpenAI account or Azure OpenAI
[llm.providers.openai_secondary]
type = "openai"
api_key = "{{ env.OPENAI_SECONDARY_KEY }}"
base_url = "https://my-azure-instance.openai.azure.com/v1"
[llm.providers.openai_secondary.models.gpt-4]
rename = "azure-gpt-4" # Distinguish from primary account
# Anthropic
[llm.providers.claude]
type = "anthropic"
api_key = "{{ env.ANTHROPIC_API_KEY }}"
[llm.providers.claude.models.claude-3-opus-20240229]
# Google Gemini
[llm.providers.gemini]
type = "google"
api_key = "{{ env.GOOGLE_API_KEY }}"
[llm.providers.gemini.models."gemini-1.5-flash"]
Nexus supports token forwarding, allowing users to provide their own API keys at request time instead of using the configured keys. This feature is opt-in and disabled by default.
Enable token forwarding for any provider by setting forward_token = true:
[llm.providers.openai]
type = "openai"
api_key = "{{ env.OPENAI_API_KEY }}" # Fallback key (optional with token forwarding)
forward_token = true # Enable token forwarding for this provider
[llm.providers.openai.models.gpt-4]
[llm.providers.openai.models.gpt-3-5-turbo]
[llm.providers.anthropic]
type = "anthropic"
# No api_key required when token forwarding is enabled
forward_token = true
[llm.providers.anthropic.models.claude-3-5-sonnet-20241022]
[llm.providers.google]
type = "google"
api_key = "{{ env.GOOGLE_API_KEY }}"
forward_token = false # Explicitly disabled (default)
[llm.providers.google.models."gemini-1.5-flash"]
When token forwarding is enabled for a provider, users can pass their own API key using the X-Provider-API-Key header:
# Using your own OpenAI key
curl -X POST http://localhost:8000/llm/v1/chat/completions \
-H "Content-Type: application/json" \
-H "X-Provider-API-Key: sk-your-openai-key" \
-d '{
"model": "openai/gpt-4",
"messages": [{"role": "user", "content": "Hello"}]
}'
# Using your own Anthropic key
curl -X POST http://localhost:8000/llm/v1/chat/completions \
-H "Content-Type: application/json" \
-H "X-Provider-API-Key: sk-ant-your-anthropic-key" \
-d '{
"model": "anthropic/claude-3-opus-20240229",
"messages": [{"role": "user", "content": "Hello"}]
}'
When token forwarding is enabled (forward_token = true):
When token forwarding is disabled (forward_token = false, default):
X-Provider-API-Key headerNexus supports header transformation for LLM providers, allowing you to forward, insert, remove, or rename headers when making requests to LLM APIs.
Configure header rules at the provider level:
[llm.providers.openai]
type = "openai"
api_key = "{{ env.OPENAI_API_KEY }}"
# Forward headers from incoming requests
[[llm.providers.openai.headers]]
rule = "forward"
name = "X-Request-ID"
# Forward with a default value if not present
[[llm.providers.openai.headers]]
rule = "forward"
name = "X-Trace-ID"
default = "generated-trace-id"
# Forward and rename
[[llm.providers.openai.headers]]
rule = "forward"
name = "X-Custom-Header"
rename = "X-OpenAI-Custom"
# Insert static headers
[[llm.providers.openai.headers]]
rule = "insert"
name = "X-OpenAI-Beta"
value = "assistants=v2"
# Remove headers
[[llm.providers.openai.headers]]
rule = "remove"
name = "X-Internal-Secret"
# Pattern-based forwarding (regex)
[[llm.providers.openai.headers]]
rule = "forward"
pattern = "X-Debug-.*"
# Pattern-based removal
[[llm.providers.openai.headers]]
rule = "remove"
pattern = "X-Internal-.*"
# Rename and duplicate (keeps original and adds renamed copy)
[[llm.providers.openai.headers]]
rule = "rename_duplicate"
name = "X-User-ID"
rename = "X-OpenAI-User"
Configure headers for specific models (overrides provider-level rules):
[llm.providers.openai.models.gpt-4]
# Model-specific headers override provider headers
[[llm.providers.openai.models.gpt-4.headers]]
rule = "insert"
name = "X-Model-Config"
value = "premium"
[[llm.providers.openai.models.gpt-4.headers]]
rule = "forward"
pattern = "X-Premium-.*"
forward: Pass headers from incoming requests to the LLM provider
name: Single header name to forwardpattern: Regex pattern to match multiple headersdefault: Optional default value if header is missingrename: Optional new name for the forwarded headerinsert: Add static headers to requests
name: Header namevalue: Static value (supports {{ env.VAR }} substitution)remove: Remove headers before sending to provider
name: Single header name to removepattern: Regex pattern to match headers to removerename_duplicate: Forward header with both original and new name
name: Original header namerename: New header name for the duplicatedefault: Optional default if header is missingX-Provider-API-Key header is handled separately for token forwardingOnce configured, you can interact with LLM providers through Nexus's unified API:
curl http://localhost:8000/llm/models
Response:
{
"object": "list",
"data": [
{
"id": "openai_primary/gpt-4-turbo",
"object": "model",
"created": 1677651200,
"owned_by": "openai"
},
{
"id": "claude/claude-3-5-sonnet-20241022",
"object": "model",
"created": 1709164800,
"owned_by": "anthropic"
},
{
"id": "gemini/gemini-1.5-pro",
"object": "model",
"created": 1710000000,
"owned_by": "google"
}
]
}
Nexus supports advanced tool calling across all LLM providers, allowing models to invoke external functions with structured parameters. All providers use the standardized OpenAI tool calling format for consistency.
Basic Tool Calling Example:
curl -X POST http://localhost:8000/llm/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "anthropic/claude-3-5-sonnet-20241022",
"messages": [
{"role": "user", "content": "What'\''s the weather in San Francisco?"}
],
"tools": [{
"type": "function",
"function": {
"name": "get_weather",
"description": "Get current weather for a location",
"parameters": {
"type": "object",
"properties": {
"location": {"type": "string", "description": "City and state"},
"unit": {"type": "string", "enum": ["celsius", "fahrenheit"]}
},
"required": ["location"]
}
}
}],
"tool_choice": "auto"
}'
Tool Choice Options:
"auto" (default): Model decides whether to call tools"none": Model won't call any tools"required": Model must call at least one tool{"type": "function", "function": {"name": "specific_function"}}: Force a specific toolParallel Tool Calls:
When supported by the provider, models can call multiple tools simultaneously:
{
"model": "openai/gpt-4",
"messages": [{"role": "user", "content": "Get weather for NYC and LA"}],
"tools": [/* tool definitions */],
"parallel_tool_calls": true
}
Tool Conversation Flow:
Tool calling creates a multi-turn conversation:
{
"messages": [
{"role": "user", "content": "What's the weather in Paris?"},
{
"role": "assistant",
"tool_calls": [{
"id": "call_123",
"type": "function",
"function": {"name": "get_weather", "arguments": "{\"location\": \"Paris\"}"}
}]
},
{
"role": "tool",
"tool_call_id": "call_123",
"content": "Weather in Paris: 22°C, sunny"
},
{
"role": "assistant",
"content": "The weather in Paris is currently 22°C and sunny!"
}
]
}
Provider-Specific Tool Support:
All providers support the standardized format, but have different capabilities:
Streaming Tool Calls:
Tool calls can be streamed just like regular responses:
curl -X POST http://localhost:8000/llm/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "openai/gpt-4",
"messages": [{"role": "user", "content": "Search for Python tutorials"}],
"tools": [/* tool definitions */],
"stream": true
}'
The stream will include tool call chunks as they're generated, followed by the tool execution results.
Send a chat completion request using the OpenAI-compatible format:
curl -X POST http://localhost:8000/llm/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "openai_primary/gpt-4-turbo",
"messages": [
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "Hello, how are you?"}
],
"temperature": 0.7,
"max_tokens": 150
}'
The model name format is <provider_name>/<model_id>. Nexus automatically routes the request to the appropriate provider and transforms the request/response as needed.
Nexus supports streaming responses for all LLM providers using Server-Sent Events (SSE):
curl -X POST http://localhost:8000/llm/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "anthropic/claude-3-5-sonnet-20241022",
"messages": [
{"role": "user", "content": "Write a short poem"}
],
"stream": true,
"max_tokens": 100
}'
When stream: true is set, the response will be streamed as Server-Sent Events with the following format:
data: {"id":"msg_123","object":"chat.completion.chunk","created":1234567890,"model":"anthropic/claude-3-5-sonnet-20241022","choices":[{"index":0,"delta":{"role":"assistant","content":"Here"}}]}
data: {"id":"msg_123","object":"chat.completion.chunk","created":1234567890,"model":"anthropic/claude-3-5-sonnet-20241022","choices":[{"index":0,"delta":{"content":" is"}}]}
data: {"id":"msg_123","object":"chat.completion.chunk","created":1234567890,"model":"anthropic/claude-3-5-sonnet-20241022","choices":[{"index":0,"delta":{},"finish_reason":"stop"}],"usage":{"prompt_tokens":10,"completion_tokens":25,"total_tokens":35}}
data: [DONE]
Streaming is supported for all providers (OpenAI, Anthropic, Google) and provides:
system fieldmax_tokens parameter (defaults to 4096 if not specified)systemInstruction fieldNexus provides an OpenAI-compatible API, making it easy to use with existing LLM SDKs and libraries. Simply point the SDK to your Nexus instance instead of the provider's API.
from openai import OpenAI
# Point to your Nexus instance
client = OpenAI(
base_url="http://localhost:8000/llm",
api_key="your-service-token" # Use a JWT token if OAuth2 is enabled, or any string if not
)
# Use any configured provider/model
response = client.chat.completions.create(
model="anthropic/claude-3-5-sonnet-20241022",
messages=[
{"role": "user", "content": "Hello!"}
]
)
# Streaming works seamlessly
stream = client.chat.completions.create(
model="openai/gpt-4-turbo",
messages=[
{"role": "user", "content": "Write a poem"}
],
stream=True
)
for chunk in stream:
if chunk.choices[0].delta.content:
print(chunk.choices[0].delta.content, end="")
import OpenAI from 'openai';
// Configure to use Nexus
const openai = new OpenAI({
baseURL: 'http://localhost:8000/llm',
apiKey: 'your-service-token', // Use a JWT token if OAuth2 is enabled, or any string if not
});
// Use any provider through Nexus
const response = await openai.chat.completions.create({
model: 'google/gemini-1.5-pro',
messages: [
{ role: 'user', content: 'Explain quantum computing' }
],
});
// Streaming with any provider
const stream = await openai.chat.completions.create({
model: 'anthropic/claude-3-opus-20240229',
messages: [
{ role: 'user', content: 'Write a story' }
],
stream: true,
});
for await (const chunk of stream) {
process.stdout.write(chunk.choices[0]?.delta?.content || '');
}
from langchain_openai import ChatOpenAI
# Use Nexus as the LLM provider
llm = ChatOpenAI(
base_url="http://localhost:8000/llm",
api_key="your-service-token", # Use a JWT token if OAuth2 is enabled
model="openai/gpt-4-turbo"
)
# Works with any configured provider
claude = ChatOpenAI(
base_url="http://localhost:8000/llm",
api_key="your-service-token", # Use a JWT token if OAuth2 is enabled
model="anthropic/claude-3-5-sonnet-20241022"
)
# Regular completion (with OAuth2 authentication if enabled)
curl -s http://localhost:8000/llm/chat/completions \
-H "Content-Type: application/json" \
-H "Authorization: Bearer your-jwt-token" \
-d '{
"model": "openai/gpt-4",
"messages": [{"role": "user", "content": "Hello"}]
}' | jq -r '.choices[0].message.content'
# Streaming with SSE parsing
curl -s http://localhost:8000/llm/chat/completions \
-H "Content-Type: application/json" \
-H "Authorization: Bearer your-jwt-token" \
-d '{
"model": "anthropic/claude-3-5-sonnet-20241022",
"messages": [{"role": "user", "content": "Write a haiku"}],
"stream": true
}' | grep "^data: " | sed 's/^data: //' | jq -r 'select(.choices != null) | .choices[0].delta.content // empty'
When OAuth2 is enabled in your Nexus configuration, you must provide a valid JWT token:
# With OAuth2 enabled
client = OpenAI(
base_url="http://localhost:8000/llm",
api_key="eyJhbGciOiJSUzI1NiIs..." # Your JWT token
)
Without OAuth2, the api_key field is still required by most SDKs but can be any non-empty string:
# Without OAuth2
client = OpenAI(
base_url="http://localhost:8000/llm",
api_key="dummy" # Any non-empty string works
)
Nexus provides consistent error responses across all providers:
Example error response:
{
"error": {
"message": "Invalid model format: expected 'provider/model', got 'invalid-format'",
"type": "invalid_request_error",
"code": 400
}
}
Nexus provides comprehensive observability through OpenTelemetry metrics, distributed tracing, and logs export. All telemetry features have zero overhead when the [telemetry] section is absent from configuration.
Telemetry activation is controlled by exporters configuration:
Each telemetry signal (metrics, tracing, logs) can override the global exporters configuration with their own specific exporters.
[telemetry]
service_name = "nexus-production" # Optional, defaults to "nexus"
# Resource attributes for all telemetry (optional)
[telemetry.resource_attributes]
environment = "production"
region = "us-east-1"
# OTLP exporter for metrics
[telemetry.exporters.otlp]
enabled = true # Default: false
endpoint = "http://localhost:4317" # Default: http://localhost:4317
protocol = "grpc" # Default: grpc (options: grpc, http)
timeout = "60s" # Default: 60s
# Protocol-specific configuration
# For gRPC protocol (headers must be lowercase, cannot start with "grpc-"):
[telemetry.exporters.otlp.grpc.headers]
authorization = "Bearer {{ env.OTLP_TOKEN }}"
x-nexus-shard = "primary"
# TLS configuration for secure gRPC connections (all fields optional):
[telemetry.exporters.otlp.grpc.tls]
domain_name = "custom_name" # Domain name for TLS verification (SNI)
key = "/path/to/key.pem" # Path to client private key PEM file
cert = "/path/to/cert.pem" # Path to client certificate PEM file
ca = "/path/to/ca.pem" # Path to CA certificate PEM file
# For HTTP protocol (standard HTTP header names):
[telemetry.exporters.otlp.http.headers]
Authorization = "Bearer {{ env.OTLP_TOKEN }}"
X-Nexus-Shard = "primary"
# Batch export configuration (all optional with defaults)
[telemetry.exporters.otlp.batch_export]
scheduled_delay = "5s" # Default: 5s
max_queue_size = 2048 # Default: 2048
max_export_batch_size = 512 # Default: 512
max_concurrent_exports = 1 # Default: 1
See Available Metrics for detailed metric definitions.
[telemetry.tracing]
sampling = 0.15 # Sample 15% of requests (default: 0.15)
parent_based_sampler = false # Respect parent trace sampling decision (default: false)
# Collection limits (per span)
[telemetry.tracing.collect]
max_events_per_span = 128
max_attributes_per_span = 128
max_links_per_span = 128
max_attributes_per_event = 128
max_attributes_per_link = 128
# Trace context propagation formats
[telemetry.tracing.propagation]
trace_context = true # W3C Trace Context
aws_xray = false # AWS X-Ray format
# Optional: Override global OTLP exporter for traces
[telemetry.tracing.exporters.otlp]
enabled = true
endpoint = "http://localhost:4317"
protocol = "grpc" # or "http"
timeout = "60s"
# Protocol-specific headers
[telemetry.tracing.exporters.otlp.grpc.headers]
authorization = "Bearer {{ env.OTLP_TOKEN }}"
# TLS configuration (same format as global OTLP):
# [telemetry.tracing.exporters.otlp.grpc.tls]
# domain_name = "custom_name"
# key = "/path/to/key.pem"
# cert = "/path/to/cert.pem"
# ca = "/path/to/ca.pem"
# Or for HTTP:
# [telemetry.tracing.exporters.otlp.http.headers]
# Authorization = "Bearer {{ env.OTLP_TOKEN }}"
See Distributed Tracing for span hierarchy and attributes.
Nexus can export application logs to OpenTelemetry collectors, automatically correlating them with distributed traces. Logs are enabled when OTLP exporters are configured (either globally or specifically for logs).
# Option 1: Use global OTLP exporter for all telemetry signals
# (logs will use the same endpoint as metrics and traces)
# Option 2: Configure a separate OTLP endpoint specifically for logs
[telemetry.logs.exporters.otlp]
enabled = true
endpoint = "http://logs-collector:4317" # Different endpoint for logs
protocol = "grpc" # or "http"
timeout = "30s"
# Protocol-specific headers
[telemetry.logs.exporters.otlp.grpc.headers]
authorization = "Bearer {{ env.OTLP_TOKEN }}"
# TLS configuration (same format as global OTLP):
# [telemetry.logs.exporters.otlp.grpc.tls]
# domain_name = "custom_name"
# key = "/path/to/key.pem"
# cert = "/path/to/cert.pem"
# ca = "/path/to/ca.pem"
# Or for HTTP:
# [telemetry.logs.exporters.otlp.http.headers]
# Authorization = "Bearer {{ env.OTLP_TOKEN }}"
# Logs are automatically correlated with active trace and span IDs
When logs export is enabled, logs include:
Note: OpenTelemetry's own internal logs are filtered out to prevent recursion.
All histograms also function as counters (count field tracks number of observations).
HTTP Server Metrics:
http.server.request.duration (histogram)
http.route, http.request.method, http.response.status_codeMCP Operation Metrics:
mcp.tool.call.duration (histogram)
tool_name, tool_type (builtin/downstream), status (success/error), client.id, client.groupkeyword_count, result_countserver_name (for downstream tools)error.type:
parse_error - Invalid JSON (-32700)invalid_request - Not a valid request (-32600)method_not_found - Method/tool does not exist (-32601)invalid_params - Invalid method parameters (-32602)internal_error - Internal server error (-32603)rate_limit_exceeded - Rate limit hit (-32000)server_error - Other server errors (-32001 to -32099)unknown - Any other error codemcp.tools.list.duration (histogram)
method (list_tools), status, client.id, client.groupmcp.prompt.request.duration (histogram)
method (list_prompts/get_prompt), status, client.id, client.grouperror.type (same values as above)mcp.resource.request.duration (histogram)
method (list_resources/read_resource), status, client.id, client.grouperror.type (same values as above)LLM Operation Metrics:
gen_ai.client.operation.duration (histogram)
gen_ai.system (always "nexus.llm")gen_ai.operation.name (always "chat.completions")gen_ai.request.model (e.g., "openai/gpt-4")gen_ai.response.finish_reason (stop/length/tool_calls/content_filter)client.id (from x-client-id header)client.group (from x-client-group header)error.type (for failed requests):
invalid_request - Malformed requestauthentication_failed - Invalid API keyinsufficient_quota - Quota exceededmodel_not_found - Unknown modelrate_limit_exceeded - Provider or token rate limit hitstreaming_not_supported - Streaming unavailable for modelinvalid_model_format - Incorrect model name formatprovider_not_found - Unknown providerinternal_error - Server errorprovider_api_error - Upstream provider errorconnection_error - Network failuregen_ai.client.time_to_first_token (histogram)
gen_ai.system, gen_ai.operation.name, gen_ai.request.model, client.id, client.groupgen_ai.client.input.token.usage (counter)
gen_ai.system, gen_ai.request.model, client.id, client.groupgen_ai.client.output.token.usage (counter)
gen_ai.system, gen_ai.request.model, client.id, client.groupgen_ai.client.total.token.usage (counter)
gen_ai.system, gen_ai.request.model, client.id, client.groupRedis Storage Metrics (when rate limiting with Redis is enabled):
redis.command.duration (histogram)
operation - The Redis operation type:
check_and_consume - HTTP rate limit checkingcheck_and_consume_tokens - Token-based rate limit checkingstatus (success/error)tokens (for token operations) - Number of tokens checkedredis.pool.connections.in_use (gauge)
redis.pool.connections.available (gauge)
Configuration: See Tracing Configuration in the Telemetry section.
Distributed tracing provides detailed insights into request flows across all components, helping you understand latency, identify bottlenecks, and debug issues in production.
Nexus supports multiple trace context propagation formats for incoming requests:
traceparent and tracestate headers (enabled by default)X-Amzn-Trace-Id header for AWS services integrationWhen enabled, Nexus will:
Note: Nexus is a terminal node for traces - it does not propagate trace context to downstream MCP servers or LLM providers.
Nexus creates a hierarchical span structure for each request:
HTTP Request (root span)
├── Redis Rate Limit Check (redis:check_and_consume:global/ip) - HTTP-level rate limits
├── MCP Operation
│ ├── Redis Rate Limit Check (redis:check_and_consume:server/tool) - MCP-specific rate limits
│ ├── Tool Search (index:search)
│ └── Tool Execution
│ └── Downstream Operations (downstream:execute, downstream:call_tool)
└── LLM Operation (llm:chat_completion or llm:chat_completion_stream)
└── Token Rate Limit Check (redis:check_and_consume_tokens) - Token-based rate limits
Each span includes semantic attributes following OpenTelemetry conventions:
HTTP Spans:
http.request.method - HTTP method (GET, POST, etc.)http.route - Matched route patternhttp.response.status_code - Response status codeurl.scheme - URL scheme (http/https)server.address - Host header valueurl.full - Full request URLclient.id - Client identifier (if configured)client.group - Client group (if configured)error - Set to "true" if request failedMCP Spans:
mcp.method - MCP method name (tools/list, tools/call, etc.)mcp.tool.name - Tool being calledmcp.tool.type - Tool type (builtin/downstream)mcp.transport - Transport type (stdio/http)mcp.auth_forwarded - Whether auth was forwardedclient.id - Client identifier (if configured)client.group - Client group (if configured)mcp.search.keywords - Search keywords (for search tool)mcp.search.keyword_count - Number of search keywordsmcp.execute.target_tool - Tool being executedmcp.execute.target_server - Server hosting the toolmcp.error.code - Error code if operation failedLLM Spans:
gen_ai.request.model - Model identifiergen_ai.request.max_tokens - Max tokens requestedgen_ai.request.temperature - Temperature parametergen_ai.request.has_tools - Whether tools were providedgen_ai.request.tool_count - Number of tools providedgen_ai.response.model - Model used for responsegen_ai.response.finish_reason - Completion reasongen_ai.usage.input_tokens - Input token countgen_ai.usage.output_tokens - Output token countgen_ai.usage.total_tokens - Total token countllm.stream - Whether streaming was usedllm.auth_forwarded - Whether auth was forwardedclient.id - Client identifier (if configured)client.group - Client group (if configured)error.type - Error type if operation failedRedis Spans:
redis.operation - Operation type (check_and_consume or check_and_consume_tokens)rate_limit.scope - Scope (global/ip/server/tool/token)rate_limit.limit - Request/token limitrate_limit.interval_ms - Time window in millisecondsrate_limit.tokens - Number of tokens (for token operations)rate_limit.allowed - Whether request was allowedrate_limit.retry_after_ms - Retry delay if rate limitedredis.pool.size - Connection pool sizeredis.pool.available - Available connectionsredis.pool.in_use - Connections in useclient.address_hash - Hashed IP for privacy (per-IP limits)llm.provider - Provider name (token limits)llm.model - Model name (token limits)mcp.server - Server name (per-server limits)mcp.tool - Tool name (per-tool limits)Nexus supports two sampling strategies:
Fixed-Rate Sampling (default): Randomly samples a percentage of traces based on the configured rate.
[telemetry.tracing]
sampling = 0.15 # Sample 15% of requests (0.0 to 1.0)
Parent-Based Sampling: When enabled, Nexus respects the sampling decision from parent traces in distributed systems.
[telemetry.tracing]
sampling = 0.15 # Default sampling rate for root traces
parent_based_sampler = true # Respect parent's sampling decision
With parent_based_sampler = true:
sampling rate is usedThis ensures consistent sampling decisions across distributed traces, preventing incomplete trace data where some spans are sampled while others are not.
When a parent trace context is provided in request headers (via W3C Trace Context or AWS X-Ray), Nexus will continue that trace.
Nexus traces can be exported to any OpenTelemetry-compatible APM tool:
Example Jaeger configuration:
[telemetry.tracing.exporters.otlp]
enabled = true
endpoint = "http://jaeger:4317"
protocol = "grpc"
Add to your Cursor settings:
{
"nexus": {
"transport": {
"type": "http",
"url": "http://localhost:8000/mcp"
}
}
}
Make sure Nexus is running on localhost:8000 (or adjust the URL accordingly).
Add to your Claude Code configuration:
Open Claude Code and run the command:
claude mcp add --transport http nexus http://localhost:8000/mcp
Or add it to your project's .mcp.json file:
{
"mcpServers": {
"nexus": {
"type": "http",
"url": "http://localhost:8000/mcp"
}
}
}
Verify the connection:
claude mcp list
Make sure Nexus is running before starting Claude Code.
Configure Codex CLI to talk to Nexus by editing ~/.codex/config.toml:
Note: Nexus serves its OpenAI-compatible endpoint at
/llm/openai/; Codex expects the/v1suffix. Make sure thebase_urlends with/v1.
[model_providers.nexus]
name = "Nexus AI router"
base_url = "http://127.0.0.1:8000/llm/openai/v1"
wire_api = "chat"
query_params = {}
base_url must point to the Nexus OpenAI-compatible endpoint and include the /v1 suffix (adjust host/port if Nexus runs elsewhere).query_params can stay empty, but the table must exist to satisfy Codex's schema.Start Codex with a Nexus-managed model:
codex -c model="anthropic/claude-3-5-haiku-latest" -c model_provider=nexus
Swap the model name for any provider/model pair that you have configured in Nexus.
Nexus provides two main tools to AI assistants:
search: A context-aware tool search that uses fuzzy matching to find relevant tools across all connected MCP serversexecute: Executes a specific tool with the provided parametersWhen an AI assistant connects to Nexus, it can:
All tools from downstream servers are namespaced with their server name (e.g., github__search_code, filesystem__read_file).
Nexus acts as a unified gateway for multiple LLM providers:
Models are namespaced with their provider name (e.g., openai/gpt-4, anthropic/claude-3-opus-20240229).
STDIO servers are spawned as child processes and communicate via JSON-RPC over standard input/output:
Once configured, AI assistants can interact with Nexus like this:
Search for tools:
User: "I need to search for code on GitHub"
Assistant: Let me search for GitHub-related tools...
[Calls search with keywords: ["github", "code", "search"]]
Execute tools:
Assistant: I found the `github__search_code` tool. Let me search for your query...
[Calls execute with name: "github__search_code" and appropriate arguments]
[mcp.servers.python_tools]
cmd = ["python", "-m", "my_mcp_server"]
env = { PYTHONPATH = "/opt/mcp", PYTHONUNBUFFERED = "1" }
stderr = "inherit" # See Python output during development
[mcp.servers.node_tools]
cmd = ["node", "mcp-server.js"]
cwd = "/path/to/project"
env = { NODE_ENV = "production" }
[mcp.servers.filesystem]
cmd = ["npx", "-y", "@modelcontextprotocol/server-filesystem", "/home/user"]
stderr = "inherit" temporarily to see error messagescwd path exists if specifiedWhen using OAuth2 authentication:
expected_issuer and expected_audience to validate JWT claims/.well-known/oauth-protected-resource endpoint provides OAuth2 metadata and is publicly accessible/health endpoint bypasses OAuth2 authentication for monitoring systemsNexus is licensed under the Mozilla Public License 2.0 (MPL-2.0). See the LICENSE file for details.
See CONTRIBUTING.md for guidelines on how to contribute to Nexus.
io.github.ericm1018/skillfm-llm-cost-optimizer-openai-anthropic-usage
io.github.mikerawsonnz/llm-orchestration-agent
io.github.mikerawsonnz/authenticated-llm-agent
labforgedev/copilot-memory-mcp
csoai-org/agent-prompt-injection-firewall-mcp
io.github.mikerawsonnz/authenticated-multi-llm-agent