karpathy/nanogpt

Analysis updated 2026-06-20

★ 57,620PythonAudience · researcherComplexity · 4/5Setup · hard

Mindmap

mindmap
  root((nanogpt))
    What it does
      GPT model training
      Fine-tuning support
      Readable codebase
    Architecture
      300-line model file
      300-line training loop
      Multi-GPU support
    Use cases
      Learn GPT internals
      Research experiments
      Custom dataset training
    Tech stack
      Python
      PyTorch
    Audience
      ML researchers
      AI learners

mindmap root((nanogpt)) What it does GPT model training Fine-tuning support Readable codebase Architecture 300-line model file 300-line training loop Multi-GPU support Use cases Learn GPT internals Research experiments Custom dataset training Tech stack Python PyTorch Audience ML researchers AI learners

Click or tap to explore — scroll the page freely

What do people build with it?

USE CASE 1

Read 600 lines of clean code to understand exactly how GPT transformer training works under the hood.

USE CASE 2

Fine-tune a GPT-2 model on a custom text dataset as a starting point for language model research.

USE CASE 3

Train a small character-level model on a text corpus in a few minutes on a single GPU.

USE CASE 4

Use the simple codebase as a hackable base to try out modified attention or training techniques.

What is it built with?

PythonPyTorch

How does it compare?

	karpathy/nanogpt	microsoft/autogen	ultralytics/yolov5
Stars	57,620	57,750	57,334
Language	Python	Python	Python
Setup difficulty	hard	moderate	moderate
Complexity	4/5	4/5	3/5
Audience	researcher	developer	developer

Figures from each repo's GitHub metadata at analysis time.

How do you get it running?

Difficulty · hard Time to first run · 1h+

A GPU is strongly recommended, full GPT-2 reproduction takes ~4 days on 8 high-end GPUs, CPU training is extremely slow.

In plain English

nanoGPT is a minimal Python codebase for training and fine-tuning GPT-style language models, designed to be readable and hackable rather than production-hardened. GPT models are neural networks that learn to predict the next word in a sequence and can be fine-tuned to generate text that continues a given prompt. The project was written to reimplement the original GPT-2 architecture in as few lines as possible while still achieving the same training results, making the internals easy to understand and modify. The README notes that this repository is now deprecated and that its successor, nanochat, is the recommended alternative for new users. The entire project consists of two main files: a roughly 300-line training loop and a roughly 300-line model definition. Despite this simplicity, it can reproduce GPT-2 with 124 million parameters on standard benchmark datasets when run on appropriate hardware, around 4 days on 8 high-end GPUs. For experimentation on smaller hardware, it includes examples for training a character-level model on Shakespeare's works in about 3 minutes on a single GPU, or more slowly on a CPU or Apple Silicon Mac. The code supports distributed training across multiple GPUs using PyTorch's built-in parallelism tools, and can also load pre-trained GPT-2 weights from OpenAI as a starting point for fine-tuning. The tech stack is Python with PyTorch as the deep learning framework. You would use nanoGPT when you want to understand how GPT training works from first principles by reading clean, commented code, when you want a starting point for language model research, or when you need to fine-tune a GPT-style model on a custom dataset without wading through a large framework.

Copy-paste prompts

Prompt 1

Using nanoGPT, show me how to fine-tune GPT-2 on my own text file. What format does the training data need to be in?

Prompt 2

Walk me through the key parts of the nanoGPT model.py, explain what the multi-head attention block is doing line by line.

Prompt 3

How do I train a character-level nanoGPT model on a custom text corpus and then sample new text from the trained model?

Prompt 4

Show me how to modify nanoGPT to experiment with a different positional encoding and measure if it changes results.

Frequently asked questions

What is nanogpt?

nanoGPT is a minimal Python implementation of GPT language model training in about 600 lines of clean, readable code, designed for learning how these models work from scratch. Note: now deprecated in favor of nanochat.

What language is nanogpt written in?

Mainly Python. The stack also includes Python, PyTorch.

How hard is nanogpt to set up?

Setup difficulty is rated hard, with roughly 1h+ to a first successful run.

Who is nanogpt for?

Mainly researcher.

Open on GitHub → Explain another repo

This repo across BitVibe Labs

Scan in gitsafehub Deploy in gitdeployhub karpathy on gitmyhub

Verify against the repo before relying on details.