naklecha/llama3-from-scratch

★ 15,243Jupyter NotebookAudience · researcherComplexity · 4/5Setup · hard

Mindmap

mindmap
  root((llama3-from-scratch))
    What it does
      Rebuilds Llama 3
      Step by step
      Plain Python
    Key Concepts
      Tokenizer BPE
      Attention heads
      RMS normalization
      Token embeddings
    Tech Stack
      Python PyTorch
      Jupyter Notebook
      tiktoken
    Requirements
      Meta weight file
      Gated access
      High RAM machine
    Audience
      ML researchers
      LLM learners

mindmap root((llama3-from-scratch)) What it does Rebuilds Llama 3 Step by step Plain Python Key Concepts Tokenizer BPE Attention heads RMS normalization Token embeddings Tech Stack Python PyTorch Jupyter Notebook tiktoken Requirements Meta weight file Gated access High RAM machine Audience ML researchers LLM learners

Click or tap to explore — scroll the page freely

Things people build with this

USE CASE 1

Walk through every step of a transformer forward pass using real Llama 3 weights to understand how large language models actually work.

USE CASE 2

Use as a hands-on companion to understand attention heads, RMS normalization, and token embeddings alongside running code.

USE CASE 3

Adapt the notebook to inspect how specific inputs flow through Llama 3 layers for research or debugging transformer internals.

Tech stack

PythonPyTorchJupyter Notebooktiktoken

Getting it running

Difficulty · hard Time to first run · 1day+

Requires downloading official Llama 3 weights from Meta (gated access) plus a machine with enough RAM or VRAM to load them.

In plain English

This repository is a long, hand-walked tutorial that re-implements Meta's Llama 3 language model from scratch, one matrix multiplication at a time. Llama 3 is a large language model: software that takes some text and predicts the next piece of text. Rather than wrap it inside a black-box library call, the author loads Meta's published Llama 3 weights file directly and reconstructs every step the model takes in plain Python, narrating what is happening as the shapes of the numbers change. The README walks through the pipeline a beginner needs to follow such a model. First it sets up a tokenizer (the piece that splits text into the numeric tokens the model actually processes), borrowing tiktoken to handle the byte-pair encoding rather than writing one. Then it reads the raw model file and inspects its config, which the file itself reports as 32 transformer layers, 32 attention heads, and a vocabulary of 128256 tokens. From there the notebook converts text to tokens, looks up token embeddings, applies RMS normalisation, and goes layer by layer through the transformer block, building queries, keys, values, and outputs for each attention head by hand. You would read this repository if you already use large language models and want to understand what is actually happening inside one, or if you find it easier to learn by reading numeric code than reading a research paper. It is presented as a Jupyter Notebook and depends on PyTorch and tiktoken, both named in the README. Running it requires downloading the official Llama 3 weights from Meta. The full README is longer than what was provided.

Copy-paste prompts

Prompt 1

In the llama3-from-scratch notebook, what is happening when the code builds queries, keys, and values for each attention head? Walk me through the tensor shapes at each step.

Prompt 2

How does RMS normalization work in this from-scratch Llama 3 implementation and why is it used instead of standard LayerNorm?

Prompt 3

I want to visualize the attention weights for a specific input token in the llama3-from-scratch notebook. How do I extract and plot them?

Prompt 4

Explain what byte-pair encoding does in the tokenizer step of this notebook and show me how to run it on a custom input string using tiktoken.

Prompt 5

How does the notebook load the raw Llama 3 weight file and map tensor names to the transformer layers? Walk me through that setup section.

Open on GitHub → Explain another repo

← naklecha on gitmyhub — every repo by this author, as a profile.

Verify against the repo before relying on details.