karpathy/nanochat

Analysis updated 2026-05-18

★ 53,015PythonAudience · researcherComplexity · 4/5LicenseSetup · hard

Mindmap

mindmap
  root((nanochat))
    What it does
      Train language models
      Tokenize text
      Finetune models
      Chat interface
    Pipeline stages
      Pretraining
      Finetuning
      Evaluation
      Inference
    Design philosophy
      Single depth parameter
      Auto-calculated settings
      Compute-optimal
    Tech stack
      Python
      PyTorch
      torchrun
      uv
    Use cases
      Research LLM training
      Reproduce GPT-2
      Experiment with models
    Audience
      ML researchers
      ML engineers

mindmap root((nanochat)) What it does Train language models Tokenize text Finetune models Chat interface Pipeline stages Pretraining Finetuning Evaluation Inference Design philosophy Single depth parameter Auto-calculated settings Compute-optimal Tech stack Python PyTorch torchrun uv Use cases Research LLM training Reproduce GPT-2 Experiment with models Audience ML researchers ML engineers

Click or tap to explore — scroll the page freely

What do people build with it?

USE CASE 1

Train a GPT-2-equivalent language model from scratch on a GPU cluster for under $100 in two hours.

USE CASE 2

Finetune a pretrained language model for custom chatbot behavior and evaluate its performance.

USE CASE 3

Study the complete pipeline of language model development from tokenization through inference with a web chat interface.

USE CASE 4

Experiment with neural network architecture by adjusting the depth parameter to automatically optimize compute efficiency.

What is it built with?

PythonPyTorchtorchrunuv

How does it compare?

	karpathy/nanochat	psf/requests	zie619/n8n-workflows
Stars	53,015	53,968	54,165
Language	Python	Python	Python
Setup difficulty	hard	easy	moderate
Complexity	4/5	2/5	2/5
Audience	researcher	developer	developer

Figures from each repo's GitHub metadata at analysis time.

How do you get it running?

Difficulty · hard Time to first run · 1h+

Requires GPU cluster setup, PyTorch/CUDA configuration, and distributed training infrastructure (torchrun).

Use freely for any purpose including commercial, as long as you keep the copyright notice.

In plain English

nanochat is a minimal, experimental toolkit for training large language models, the type of AI that powers chatbots like ChatGPT, from scratch on a single cluster of high-powered GPUs. The headline claim is that you can reproduce a model with the same capability as GPT-2 (a landmark AI model from 2019 that cost approximately $43,000 to train) for under $100 today, in roughly two hours, thanks to seven years of hardware and software improvements. The project covers every stage of building a language model: tokenization (converting raw text into numbers the model can process), pretraining (the initial training phase where the model reads a huge amount of text to learn language patterns), finetuning (adjusting the model for specific behavior), evaluation (measuring how good the model is), and inference (actually generating text). It also includes a web-based chat interface so you can talk to your trained model just as you would with ChatGPT. The design philosophy is deliberately simple. All the complexity knobs are reduced to a single parameter called depth, which is the number of layers in the neural network. Setting that one number automatically calculates all other settings, network width, learning rate, training duration, and more, so that the resulting model is compute-optimal without requiring expert tuning. This is a project for machine learning researchers and engineers who want to study and experiment with how language models are built at a low level. It is not a consumer product, you need access to rented GPU servers (typically eight H100 or A100 GPUs) and familiarity with Python and the command line. The tech stack is Python using PyTorch, the dominant deep learning framework. Dependency management uses uv. Training is distributed across multiple GPUs using PyTorch's torchrun utility.

Copy-paste prompts

Prompt 1

How do I set up nanochat to train a language model on my GPU cluster? Walk me through the tokenization and pretraining steps.

Prompt 2

I want to finetune a nanochat model for a specific task. What's the workflow and how do I evaluate the results?

Prompt 3

Explain how the depth parameter in nanochat automatically calculates network width, learning rate, and training duration.

Prompt 4

How do I use the web chat interface to interact with a model I trained with nanochat?

Prompt 5

What hardware and dependencies do I need to run nanochat, and how does torchrun distribute training across multiple GPUs?

Frequently asked questions

What is nanochat?

Minimal toolkit for training GPT-2-level language models from scratch on GPU clusters in hours for under $100, with tokenization, training, finetuning, and a chat interface.

What language is nanochat written in?

Mainly Python. The stack also includes Python, PyTorch, torchrun.

What license does nanochat use?

Use freely for any purpose including commercial, as long as you keep the copyright notice.

How hard is nanochat to set up?

Setup difficulty is rated hard, with roughly 1h+ to a first successful run.

Who is nanochat for?

Mainly researcher.

Open on GitHub → Explain another repo

This repo across BitVibe Labs

Scan in gitsafehub Deploy in gitdeployhub karpathy on gitmyhub

Verify against the repo before relying on details.