Bridges Claude with locally running LM Studio instances through the standard OpenAI-compatible API. Exposes nine tools for health checks, model listing, chat completions, text generation, embeddings, and stateful multi-turn conversations. Handles both simple prompt-response workflows and persistent sessions where system prompts stay locked across conversation turns. Supports flexible deployment via Python, Docker, or direct GitHub installation. Reach for this when you want to route specific queries to your private models while staying in Claude's interface, especially useful for sensitive data that shouldn't hit external APIs or when you need specialized local models for embeddings and RAG workflows.
A Model Control Protocol (MCP) server that allows Claude to communicate with locally running LLM models via LM Studio.
LMStudio-MCP creates a bridge between Claude (with MCP capabilities) and your locally running LM Studio instance. This allows Claude to:
This enables you to leverage your own locally running models through Claude's interface, combining Claude's capabilities with your private models.
curl -fsSL https://raw.githubusercontent.com/infinitimeless/LMStudio-MCP/main/install.sh | bash
git clone https://github.com/infinitimeless/LMStudio-MCP.git
cd LMStudio-MCP
pip install requests "mcp[cli]" openai
# Using pre-built image
docker run -it --network host ghcr.io/infinitimeless/lmstudio-mcp:latest
# Or build locally
git clone https://github.com/infinitimeless/LMStudio-MCP.git
cd LMStudio-MCP
docker build -t lmstudio-mcp .
docker run -it --network host lmstudio-mcp
git clone https://github.com/infinitimeless/LMStudio-MCP.git
cd LMStudio-MCP
docker-compose up -d
For detailed deployment instructions, see DOCKER.md.
The bridge supports flexible configuration for different deployment scenarios:
http://localhost:1234/v1LMSTUDIO_HOST environment variable (e.g., 192.168.1.100)LMSTUDIO_PORT environment variable (e.g., 5678)Example:
export LMSTUDIO_HOST=192.168.1.100
export LMSTUDIO_PORT=5678
python lmstudio_bridge.py
📖 For detailed configuration options, see CONFIGURATION.md
Using GitHub directly (simplest):
{
"lmstudio-mcp": {
"command": "uvx",
"args": [
"https://github.com/infinitimeless/LMStudio-MCP"
]
}
}
Using local installation:
{
"lmstudio-mcp": {
"command": "/bin/bash",
"args": [
"-c",
"cd /path/to/LMStudio-MCP && source venv/bin/activate && python lmstudio_bridge.py"
]
}
}
Using Docker:
{
"lmstudio-mcp-docker": {
"command": "docker",
"args": [
"run",
"-i",
"--rm",
"--network=host",
"ghcr.io/infinitimeless/lmstudio-mcp:latest"
]
}
}
For complete MCP configuration instructions, see MCP_CONFIGURATION.md.
description hintYou can add a description field to your .mcp.json entry to help Claude understand when to use this server and what to expect. This is particularly useful for reminding Claude of version requirements:
{
"lmstudio-mcp": {
"command": "...",
"args": [...],
"description": "Local LLM bridge via LM Studio. Use for private/offline inference, embeddings, and multi-turn conversations. start_conversation and continue_conversation require LM Studio v0.3.29+."
}
}
Setting a system prompt directly in LM Studio gives your local model a consistent baseline personality and behaviour across all interactions — without needing to pass it on every API call.
The system prompt set here applies to all completions sent via the API, including those from this MCP bridge.
General assistant — clean and direct:
You are a helpful, concise assistant. Answer directly without preamble like
"Sure!" or "Of course!". Never cut off mid-sentence — always finish your thought.
Casual conversation partner:
You are a regular person having a relaxed conversation with a friend.
Keep responses short and natural, like real chat. No bullet points or formal
language. You can invent fun details about your life and stay consistent with them.
Never cut off mid-sentence — always finish your thought.
Local coding assistant:
You are an expert software engineer. Be concise and precise. When writing code,
always include brief inline comments. Prefer simple, readable solutions over
clever ones. Never cut off mid-sentence or mid-code block.
Privacy-first document analyst:
You are a careful document analyst. Summarise accurately and concisely.
Never invent information not present in the source material.
Always flag uncertainty explicitly.
💡 Tip: Always end your system prompt with "Never cut off mid-sentence — always finish your thought." This prevents truncated responses regardless of how
max_tokensis configured.
The bridge provides the following 9 tools:
| Tool | Description |
|---|---|
health_check() | Verify if LM Studio API is accessible |
list_models() | Get a list of all available models in LM Studio |
get_current_model() | Identify which model is currently loaded |
chat_completion(prompt, system_prompt, temperature, max_tokens) | Generate a chat response from your local model |
text_completion(prompt, temperature, max_tokens, stop_sequences) | Generate raw text/code completion — faster, no chat formatting overhead |
generate_embeddings(text, model) | Generate vector embeddings for semantic search and RAG workflows |
create_response(input_text, previous_response_id, reasoning_effort, stream, model) | Stateful conversation via response IDs — requires LM Studio v0.3.29+ |
start_conversation(system_prompt, first_message, temperature, max_tokens, model) | Start a multi-turn session with a persistent system prompt — returns a response_id |
continue_conversation(response_id, message, temperature, max_tokens, model) | Continue a session started with start_conversation — context preserved automatically |
The recommended way to run a persistent conversation with a local model:
1. start_conversation(
system_prompt="You are a friend at a bar, keep it casual and fun.",
first_message="Hey! How's it going?"
)
→ { response_id: "resp_abc...", message: "Hey! Not bad, just unwinding..." }
2. continue_conversation(
response_id="resp_abc...",
message="Work's been insane this week."
)
→ { response_id: "resp_def...", message: "Ugh, tell me about it..." }
3. continue_conversation(
response_id="resp_def...",
message="If you could go anywhere tomorrow, where would you go?"
)
→ { response_id: "resp_ghi...", message: "Honestly? Northern Portugal..." }
The system prompt is locked in for the entire session — no need to re-send it on every turn. Requires LM Studio v0.3.29+.
This project supports multiple deployment methods:
| Method | Use Case | Pros | Cons |
|---|---|---|---|
| Local Python | Development, simple setup | Fast, direct control | Requires Python setup |
| Docker | Isolated environments | Clean, portable | Requires Docker |
| Docker Compose | Production deployments | Easy management | More complex setup |
| Kubernetes | Enterprise/scale | Highly scalable | Complex configuration |
| GitHub Direct | Zero setup | No local install needed | Requires internet |
create_response, start_conversation, and continue_conversation require LM Studio v0.3.29+generate_embeddings requires an embedding-specific model (e.g. text-embedding-nomic-embed-text-v1.5)If Claude reports 404 errors when trying to connect to LM Studio:
If certain models don't work correctly:
For detailed troubleshooting help, see TROUBLESHOOTING.md.
This project includes comprehensive Docker support:
See DOCKER.md for complete containerization documentation.
Contributions are welcome! Please see CONTRIBUTING.md for guidelines.
MIT
This project was originally developed as "Claude-LMStudio-Bridge_V2" and has been renamed and open-sourced as "LMStudio-MCP".
Looking for more advanced features? Check out the community-built enhanced version:
🌟 If this project helps you, please consider giving it a star!
io.github.ericm1018/skillfm-llm-cost-optimizer-openai-anthropic-usage
io.github.mikerawsonnz/llm-orchestration-agent
io.github.mikerawsonnz/authenticated-llm-agent
labforgedev/copilot-memory-mcp
csoai-org/agent-prompt-injection-firewall-mcp
io.github.mikerawsonnz/authenticated-multi-llm-agent