explaingit

karpathy/nn-zero-to-hero

21,888Jupyter NotebookAudience · developerComplexity · 3/5StaleLicenseSetup · easy

TLDR

Free video course teaching neural networks and language models from scratch, with Jupyter Notebooks where you build working systems like backpropagation engines, word generators, and GPT models.

Mindmap

mindmap
  root((repo))
    What it does
      Teaches neural networks
      Builds AI from scratch
      Video lectures with code
    Key concepts
      Backpropagation algorithm
      Language model training
      Transformer architecture
      Tokenization process
    Learning path
      Micrograd engine
      Makemore word generator
      GPT implementation
    Tech stack
      Python
      Jupyter Notebooks
      NumPy basics
    Audience
      Curious learners
      Aspiring ML engineers
      People wanting deep understanding

Things people build with this

USE CASE 1

Learn how backpropagation and gradient descent actually work by building a neural network engine from scratch.

USE CASE 2

Build a character-level language model that generates realistic-looking words or names from training data.

USE CASE 3

Understand transformer architecture by implementing a GPT model step-by-step with working code.

USE CASE 4

Study tokenization and text preprocessing techniques used in real language models.

Tech stack

PythonJupyter NotebookNumPy

Getting it running

Difficulty · easy Time to first run · 5min
MIT License, use freely for any purpose, including commercial, as long as you keep the copyright notice.

In plain English

Neural Networks: Zero to Hero is a free video course, accompanied by Jupyter Notebook code files, that teaches how neural networks and modern AI language models work from first principles. The course is designed as a series of YouTube lectures where the instructor writes code live, building increasingly complex neural network systems from scratch. The course starts at the very bottom: Lecture 1 covers backpropagation, which is the core mathematical algorithm used to train neural networks. Rather than just explaining the concept, the instructor builds a tiny working neural network engine called "micrograd" from scratch using only basic Python. From there, the course progressively builds up to more complex architectures. Lectures 2 through 6 build a character-level language model called "makemore", a system that generates new words or names by learning statistical patterns from training data, going through increasingly sophisticated versions: a simple statistical model, a multilayer neural network, techniques for stabilizing training (Batch Normalization), a deep dive into manually computing gradients, and finally a convolutional architecture. Lecture 7 then builds a GPT (Generatively Pretrained Transformer), the same type of architecture used in AI chat systems, from scratch and in full. Lecture 8 covers tokenization, which is the process of converting text into numerical chunks that language models can process. The course assumes basic Python knowledge and a vague memory of high school calculus. Each lecture links to a YouTube video and has corresponding Jupyter Notebook files in this repository so you can follow along and run the code yourself. It's aimed at people who want to genuinely understand how modern AI systems work under the hood, not just use them.

Copy-paste prompts

Prompt 1
Walk me through the micrograd implementation in lecture 1, how does the backward pass compute gradients for each operation?
Prompt 2
Show me how to extend the makemore model from lecture 2 to use batch normalization like in lecture 4.
Prompt 3
Explain the attention mechanism in the GPT implementation from lecture 7, how do queries, keys, and values work together?
Prompt 4
How would I modify the character-level tokenizer from lecture 8 to handle a different language or special characters?
Prompt 5
What changes would I need to make to the makemore model to generate longer sequences without mode collapse?
Open on GitHub → Explain another repo

Generated 2026-05-18 · Model: sonnet-4-6 · Verify against the repo before relying on details.