Compresses prompts by 40–60% using a local two-stage pipeline: llama3.2:1b rewrites text to semantic minimum, then nomic-embed-text validates via cosine similarity (default 0.85 threshold). If validation fails, original text passes through unchanged. Exposes a single compress_prompt tool that takes text and returns compressed output plus token stats. Requires Ollama running locally with both models pulled. Built for reducing token costs in long or repetitive workflows without sacrificing conditionals or negations. Skips compression automatically below 80 tokens. Works well as a pre-processing layer before expensive API calls or when operating under strict context budgets.
claude mcp add --transport stdio base76-research-lab-token-compressor uvx token-compressor