This is Andrej Karpathy's educational GPT implementation packaged as a Claude skill, designed for learning how transformers actually work under the hood. It's deliberately minimal so you can see what's happening at each step. The Shakespeare training example runs in about 5 minutes on a CPU, which makes it accessible for experimentation without needing a GPU cluster. If you're trying to understand the fundamentals of GPT architecture or want a codebase you can actually modify and break without wading through production abstractions, this is the right starting point. It's teaching code, not production code, and that's exactly the point.
npx skills add https://github.com/davila7/claude-code-templates --skill nanogpt