This is a complete local AI coding agent that runs 35B parameter models on a 16GB Mac at 30 tokens per second, using llama.cpp or MLX to stay entirely offline. It routes your prompts to web search, shell commands, or chat automatically, and includes persistent KV cache that can sync across machines via R2. The MLX backend is the real hook here: you can save context from a 50K line codebase, kill the process, reload it in 0.0003 seconds, and pick up exactly where you left off. If you're tired of paying for cloud inference or want to experiment with out-of-core streaming that loads only the active MoE experts from SSD, this is a solid starting point with actual working code for quantized cache and direct I/O on macOS.
npx skills add https://github.com/aradotso/trending-skills --skill mac-code-local-ai-agent