Train a character-level language model on Shakespeare in 3 minutes on a single GPU to understand GPT training.
Fine-tune a pre-trained GPT-2 model on your own text dataset to generate domain-specific completions.
Reproduce GPT-2's 124M parameter model on benchmark datasets to verify training techniques work correctly.
Modify the model architecture or training loop to experiment with new ideas in language model research.
Requires PyTorch installation and CUDA setup for GPU training; CPU-only fallback possible but slow.
nanoGPT is a minimal Python codebase for training and fine-tuning GPT-style language models, designed to be readable and hackable rather than production-hardened. GPT models are neural networks that learn to predict the next word in a sequence and can be fine-tuned to generate text that continues a given prompt. The project was written to reimplement the original GPT-2 architecture in as few lines as possible while still achieving the same training results, making the internals easy to understand and modify. The README notes that this repository is now deprecated and that its successor, nanochat, is the recommended alternative for new users. The entire project consists of two main files: a roughly 300-line training loop and a roughly 300-line model definition. Despite this simplicity, it can reproduce GPT-2 with 124 million parameters on standard benchmark datasets when run on appropriate hardware, around 4 days on 8 high-end GPUs. For experimentation on smaller hardware, it includes examples for training a character-level model on Shakespeare's works in about 3 minutes on a single GPU, or more slowly on a CPU or Apple Silicon Mac. The code supports distributed training across multiple GPUs using PyTorch's built-in parallelism tools, and can also load pre-trained GPT-2 weights from OpenAI as a starting point for fine-tuning. The tech stack is Python with PyTorch as the deep learning framework. You would use nanoGPT when you want to understand how GPT training works from first principles by reading clean, commented code, when you want a starting point for language model research, or when you need to fine-tune a GPT-style model on a custom dataset without wading through a large framework.
Generated 2026-05-18 · Model: sonnet-4-6 · Verify against the repo before relying on details.