explaingit

karpathy/mingpt

24,393PythonAudience · researcherComplexity · 2/5StaleLicenseSetup · easy

TLDR

A minimal, readable Python implementation of GPT that teaches how language models work. About 300 lines of code showing the Transformer architecture, tokenizer, and training loop.

Mindmap

mindmap
  root((repo))
    What it does
      GPT reimplementation
      Text prediction model
      Educational focus
    Core components
      Transformer network
      Tokenizer
      Training loop
    Use cases
      Learn GPT internals
      Understand Transformers
      Train character models
    Tech stack
      Python
      PyTorch
    Examples included
      Number addition
      Text generation
      GPT-2 loading

Things people build with this

USE CASE 1

Learn how GPT and Transformer models work by reading clean, minimal code.

USE CASE 2

Train a character-level text generator on any text file from scratch.

USE CASE 3

Load pretrained GPT-2 weights and generate text from a prompt.

Tech stack

PythonPyTorch

Getting it running

Difficulty · easy Time to first run · 5min
Use freely for any purpose, including commercial use, as long as you keep the copyright notice.

In plain English

MinGPT is a stripped-down, educational reimplementation of GPT, the type of AI model behind ChatGPT, written by Andrej Karpathy, a prominent AI researcher. GPT (Generative Pretrained Transformer) is the family of language models that take a sequence of text as input and predict what comes next. MinGPT's purpose is not to be the most capable or efficient version; it exists to be the most readable version, so people can actually understand what is happening inside these models. The entire core implementation is about 300 lines of Python code split across three files: the model definition (the Transformer neural network itself), a tokenizer (which converts text into numbers the model can process), and a generic training loop. The Transformer is the architecture that modern large language models are built on, it processes sequences by letting every token "attend" to every other token to understand context. The repo includes several small demonstrations: training a GPT from scratch to add numbers, training one as a character-level text generator on any text file, and loading OpenAI's pretrained GPT-2 weights to generate text from a prompt. A machine learning student or researcher would use minGPT when they want to understand GPT from the ground up without wading through the complexity of production implementations. It is written in Python using PyTorch, a popular deep learning library. Note that the author has since created a successor called nanoGPT for those who want something similarly educational but more capable.

Copy-paste prompts

Prompt 1
Show me how to use minGPT to train a character-level language model on a custom text file.
Prompt 2
Walk me through the Transformer architecture in minGPT's model.py file and explain how attention works.
Prompt 3
How do I load OpenAI's GPT-2 weights into minGPT and generate text from a prompt?
Prompt 4
Explain the tokenizer in minGPT and how it converts text into numbers for the model.
Prompt 5
What is the minimal training loop needed to train a GPT model from scratch using minGPT?
Open on GitHub → Explain another repo

Generated 2026-05-18 · Model: sonnet-4-6 · Verify against the repo before relying on details.