This is Andrej Karpathy's minimal GPT implementation in roughly 300 lines of readable PyTorch code. You can train a character-level model on Shakespeare in five minutes on a CPU, or reproduce GPT-2 124M on OpenWebText with multi-GPU if you have the hardware. The entire architecture fits in model.py with zero abstractions, making it genuinely hackable for learning how transformers work under the hood. It's not for production use. If you're trying to understand attention mechanisms or experiment with architecture changes without wading through framework abstractions, this is the cleanest starting point. The code assumes you know PyTorch basics but explains nothing else, which is exactly the point.
npx skills add https://github.com/orchestra-research/ai-research-skills --skill nanogpt