Gives Claude a local, SQLite-backed memory that persists across conversations. Instead of repasting architecture decisions and schemas into every chat, you initialize a cartridge, capture notes via a global hotkey or CLI, and Claude searches it automatically through MCP's search_memory and store_memory tools. The standout piece is the temporal causal reasoning engine, which traces cause-and-effect chains through your notes rather than just returning keyword matches. It scored 100% on temporal reasoning benchmarks by building a graph of decisions, refactors, and incidents, then walking causal paths with stability scoring. Everything stays local with zero API costs. Best for codebases where context accumulates faster than you can document it.
CARTRIDGE_WORKSPACE*Path to the local folder llm-kosh should use as its memory cartridge root.
Give your local LLMs a permanent, air-gapped memory cartridge.
llm-kosh is a lightning-fast, SQLite-backed memory system that gives AI assistants (like Claude and Cursor) permanent recall across sessions. Stop copy-pasting the same 15 architectural decisions and database schemas into every new chat.
By running completely locally, it guarantees zero cloud syncing, zero API costs, and absolute privacy for your proprietary codebase.
llm-kosh natively supports the Model Context Protocol (MCP). This means tools like Claude Desktop and Cursor can automatically search and read your memory cartridge without you doing anything.
Simply add this to your claude_desktop_config.json:
{
"mcpServers": {
"llm-kosh": {
"command": "uvx",
"args": ["llm-kosh", "mcp", "--root", "C:/path/to/your/cartridge", "--allow-write"]
}
}
}
Now, when you ask Claude "What database are we using for auth?", it automatically searches your llm-kosh cartridge and knows the answer.
Most memory systems do keyword retrieval. You get back five disconnected facts and have to assemble the story yourself.
llm-kosh includes a Temporal Causal Reasoning Engine that understands why things happened and in what order. When you ask about the authentication system, it doesn't just find five matching notes — it traces the causal chain: the decision that led to the refactor, the refactor that introduced the bug, the hotfix that closed the incident. In order. With confidence scores.
You: "What happened to the auth system last quarter?"
Results:
• [NOTE] JWT token refresh bug — Dec 14
• [DECISION] Migrate to OAuth2 — Oct 3
• [NOTE] Hotfix deployed — Dec 15
• [DECISION] Auth service staging deploy — Nov 8
• [NOTE] OAuth2 provider integration — Oct 22
Five facts. No order. No story. You assemble it manually.
You: "What happened to the auth system last quarter?"
Causal chain (stability: 0.85, 5 hops):
1. Oct 3 → DECISION: Migrate to OAuth2 [approved by steering]
2. Oct 22 → OAuth2 provider integration completed (ENABLES →)
3. Nov 8 → Auth service deployed to staging, smoke tests passed (ENABLES →)
4. Dec 14 → Token refresh bug discovered in production (CAUSES →)
5. Dec 15 → Hotfix merged. Authentication fully stable. (SUPERSEDES Dec 14)
One coherent narrative. Cause and effect. Time-ordered. No assembly required.
The engine builds a causal graph over your memories as you add them. Each fact connects to related facts via typed edges (ENABLES, CAUSES, CONTRADICTS, SUPERSEDES, INFERS). When you query, it runs:
Benchmarked at 100% accuracy on temporal reasoning tests (10/10), up from a 50% baseline with keyword-only retrieval.
Once the MCP server is running, Claude can call the reasoning engine directly:
You: "Walk me through what happened to our authentication system"
Claude uses: reasoning_query("authentication system history")
→ Returns causal narrative with timestamps, edge types, and stability score
from llm_kosh.engine.reasoning import ReasoningEngine
from pathlib import Path
engine = ReasoningEngine(Path("./my-cartridge"))
result = engine.query(
"What led to the auth system refactor?",
temporal_context="2025-12-01T00:00:00Z",
depth=5
)
# result.bundle.fibers — causal chain, time-ordered
# result.stability.score — how coherent the chain is (0–1)
# result.escape_triggered — True if contradictions were surfaced
for fact_id, fiber in result.bundle.fibers.items():
print(f"{fiber.fact.valid_from}: {fiber.fact.content}")
Ctrl+Shift+Space anywhere on your OS to instantly dump a thought, decision, or code snippet into your cartridge.Get from zero to a running memory cartridge in under 2 minutes.
1. Install the core system
# Core memory store (no extra deps)
pip install llm-kosh
# With MCP server + FastAPI (recommended for Claude Desktop integration)
pip install "llm-kosh[server]"
# Everything including vector search
pip install "llm-kosh[all]"
2. Initialize a new cartridge
# Creates a new .llm-kosh root in your current directory
llm-kosh init --owner "Your Name"
3. Launch the Desktop App & Daemon
# Starts the UI, the background watcher, and the MCP bridge
llm-kosh desktop
hovecapital/read-only-local-postgres-mcp-server
cocaxcode/database-mcp
io.github.infoinlet-marketplace/mcp-mysql
io.github.cybeleri/database-admin
io.github.yash-0620/postgres-mcp-secured