Routes routine coding tasks to a local Ollama instance instead of burning Claude API tokens. Exposes tools like offload_work for boilerplate generation, compress_context for text summarization, and auto_setup to download and preload models based on your RAM. Built-in prompt injection detection and credential sanitization handle the security surface. The cost_dashboard tool tracks your actual token savings across sessions. Designed for the common pattern where 40% of your Claude Code requests are simple enough for a local 7B model to handle. Falls back to cloud if Ollama is unavailable. Comes with 736 tests and explicit tier recommendations for phi4, qwen2.5-coder 7B, or 32B depending on available memory.
claude mcp add --transport stdio blackfoil-claude-token-saver-mcp uvx claude-token-saver-mcp