Learn how GPT and Transformer models work by reading clean, minimal code.
Train a character-level text generator on any text file from scratch.
Load pretrained GPT-2 weights and generate text from a prompt.
MinGPT is a stripped-down, educational reimplementation of GPT, the type of AI model behind ChatGPT, written by Andrej Karpathy, a prominent AI researcher. GPT (Generative Pretrained Transformer) is the family of language models that take a sequence of text as input and predict what comes next. MinGPT's purpose is not to be the most capable or efficient version; it exists to be the most readable version, so people can actually understand what is happening inside these models. The entire core implementation is about 300 lines of Python code split across three files: the model definition (the Transformer neural network itself), a tokenizer (which converts text into numbers the model can process), and a generic training loop. The Transformer is the architecture that modern large language models are built on, it processes sequences by letting every token "attend" to every other token to understand context. The repo includes several small demonstrations: training a GPT from scratch to add numbers, training one as a character-level text generator on any text file, and loading OpenAI's pretrained GPT-2 weights to generate text from a prompt. A machine learning student or researcher would use minGPT when they want to understand GPT from the ground up without wading through the complexity of production implementations. It is written in Python using PyTorch, a popular deep learning library. Note that the author has since created a successor called nanoGPT for those who want something similarly educational but more capable.
Generated 2026-05-18 · Model: sonnet-4-6 · Verify against the repo before relying on details.