explaingit

karpathy/nanochat

53,616PythonAudience · researcherComplexity · 4/5ActiveLicenseSetup · hard

TLDR

Minimal toolkit for training GPT-2-level language models from scratch on GPU clusters in hours for under $100, with tokenization, training, finetuning, and a chat interface.

Mindmap

mindmap
  root((nanochat))
    What it does
      Train language models
      Tokenize text
      Finetune models
      Chat interface
    Pipeline stages
      Pretraining
      Finetuning
      Evaluation
      Inference
    Design philosophy
      Single depth parameter
      Auto-calculated settings
      Compute-optimal
    Tech stack
      Python
      PyTorch
      torchrun
      uv
    Use cases
      Research LLM training
      Reproduce GPT-2
      Experiment with models
    Audience
      ML researchers
      ML engineers

Things people build with this

USE CASE 1

Train a GPT-2-equivalent language model from scratch on a GPU cluster for under $100 in two hours.

USE CASE 2

Finetune a pretrained language model for custom chatbot behavior and evaluate its performance.

USE CASE 3

Study the complete pipeline of language model development from tokenization through inference with a web chat interface.

USE CASE 4

Experiment with neural network architecture by adjusting the depth parameter to automatically optimize compute efficiency.

Tech stack

PythonPyTorchtorchrunuv

Getting it running

Difficulty · hard Time to first run · 1h+

Requires GPU cluster setup, PyTorch/CUDA configuration, and distributed training infrastructure (torchrun).

Use freely for any purpose including commercial, as long as you keep the copyright notice.

In plain English

nanochat is a minimal, experimental toolkit for training large language models, the type of AI that powers chatbots like ChatGPT, from scratch on a single cluster of high-powered GPUs. The headline claim is that you can reproduce a model with the same capability as GPT-2 (a landmark AI model from 2019 that cost approximately $43,000 to train) for under $100 today, in roughly two hours, thanks to seven years of hardware and software improvements. The project covers every stage of building a language model: tokenization (converting raw text into numbers the model can process), pretraining (the initial training phase where the model reads a huge amount of text to learn language patterns), finetuning (adjusting the model for specific behavior), evaluation (measuring how good the model is), and inference (actually generating text). It also includes a web-based chat interface so you can talk to your trained model just as you would with ChatGPT. The design philosophy is deliberately simple. All the complexity knobs are reduced to a single parameter called depth, which is the number of layers in the neural network. Setting that one number automatically calculates all other settings, network width, learning rate, training duration, and more, so that the resulting model is compute-optimal without requiring expert tuning. This is a project for machine learning researchers and engineers who want to study and experiment with how language models are built at a low level. It is not a consumer product, you need access to rented GPU servers (typically eight H100 or A100 GPUs) and familiarity with Python and the command line. The tech stack is Python using PyTorch, the dominant deep learning framework. Dependency management uses uv. Training is distributed across multiple GPUs using PyTorch's torchrun utility.

Copy-paste prompts

Prompt 1
How do I set up nanochat to train a language model on my GPU cluster? Walk me through the tokenization and pretraining steps.
Prompt 2
I want to finetune a nanochat model for a specific task. What's the workflow and how do I evaluate the results?
Prompt 3
Explain how the depth parameter in nanochat automatically calculates network width, learning rate, and training duration.
Prompt 4
How do I use the web chat interface to interact with a model I trained with nanochat?
Prompt 5
What hardware and dependencies do I need to run nanochat, and how does torchrun distribute training across multiple GPUs?
Open on GitHub → Explain another repo

Generated 2026-05-18 · Model: sonnet-4-6 · Verify against the repo before relying on details.