explaingit

harvardnlp/annotated-transformer

7,248Jupyter NotebookAudience · researcherComplexity · 2/5Setup · easy

TLDR

A step-by-step, fully annotated implementation of the Transformer neural network, with working Python code and plain-English commentary side by side in a Jupyter Notebook, ideal for learning how modern AI language models are built.

Mindmap

mindmap
  root((annotated-transformer))
    What it does
      Explains Transformer
      Line-by-line code
      Interactive notebook
    Tech Stack
      Python
      Jupyter Notebook
      jupytext
    Use Cases
      Learn Transformer
      Teach attention
      Run experiments
    Audience
      Students
      Researchers
      NLP learners
Click or tap to explore — scroll the page freely

Code map

Detail Auto

An interactive map of this repo's files and how they connect — its source is parsed live in your browser. Click Visualize to build it.

filefunction / class

Things people build with this

USE CASE 1

Learn how the Transformer neural network works by reading and running annotated code with explanations for every step.

USE CASE 2

Use as a teaching reference when introducing attention mechanisms and Transformer architecture to students.

USE CASE 3

Modify the implementation to experiment with variations of the Transformer model in a local notebook.

Tech stack

PythonJupyter Notebookjupytext

Getting it running

Difficulty · easy Time to first run · 5min

A Google Colab link lets you open and run the notebook instantly with no local setup required.

No license information was mentioned in the explanation.

In plain English

The Annotated Transformer is a project from Harvard's natural language processing group that presents a working implementation of the Transformer neural network architecture alongside line-by-line explanations. Rather than just publishing the code or just presenting the theory, it interweaves working Python code with commentary explaining what each section does and why. The full content is available as a blog post on the Harvard NLP website, and this repository holds the source files that generate it. The project is delivered as a Jupyter Notebook, which is a type of document that mixes executable code cells with text and images. This format works well for educational material where you want to show code and explanation side by side. To avoid common problems with notebooks in version control (they store execution outputs inside the file, making it hard to compare versions), the repository keeps the content as a plain Python file and uses a tool called jupytext to generate the actual notebook file from it on demand. This keeps the repository history clean. To run it locally, you install the Python dependencies from a provided requirements file, then run a build command that produces the notebook. A separate command generates an HTML version of the full document. There is also a Google Colab link in the README that lets you open and run everything in the cloud without any local setup. The code follows Python's PEP8 formatting standards, and automated checks run on any contribution to enforce consistent style.

Copy-paste prompts

Prompt 1
Walk me through the annotated-transformer notebook, explain the multi-head attention code in plain English as if I have never studied neural networks.
Prompt 2
Using the annotated-transformer as a reference, show me how to implement a simple Transformer encoder block in Python from scratch.
Prompt 3
Help me modify the annotated-transformer positional encoding to use learned embeddings instead of fixed sine/cosine ones.
Prompt 4
What does the scaled dot-product attention function in the annotated-transformer do, and why is the score divided by the square root of the key dimension?
Open on GitHub → Explain another repo

← harvardnlp on gitmyhub — every repo by this author, as a profile.

Verify against the repo before relying on details.