Connects Claude or any MCP client to Forge's swarm optimization service for turning PyTorch code into production Triton or CUDA kernels. Exposes three tools: authenticate via browser OAuth, optimize existing PyTorch operations with 32 parallel AI agents running on datacenter GPUs (B200 through T4), and generate new kernels from natural language specs. The optimizer benchmarks every candidate against torch.compile max-autotune and returns drop-in replacements with speedup metrics. Your agent automatically triggers optimization when it spots custom autograd functions, performance comments, or compute-heavy modules. Results come back in minutes at 250k tokens per second inference. Best for teams shipping ML inference who need provably faster kernels without manual Triton expertise.
claude mcp add --transport stdio rightnow-ai-forge-mcp-server uvx forge-mcp-server