explaingit

nashknight/llm-whiteboard

18unknownAudience · researcherComplexity · 4/5ActiveSetup · easy

TLDR

A Chinese-language study guide with short PyTorch snippets for the LLM internals (attention, RoPE, LoRA, MoE, sampling) that come up in whiteboard ML interviews.

Mindmap

mindmap
  root((llm-whiteboard))
    Inputs
      PyTorch snippets
      Markdown chapters
      PDF version
    Outputs
      Whiteboard recall
      Interview practice
    Use Cases
      ML interview prep
      Learn LLM internals
      Reference for live coding
    Topics
      Attention variants
      Transformer blocks
      Loss functions
      Decoding strategies
    Tech Stack
      PyTorch
      Markdown

Things people build with this

USE CASE 1

Practice writing self-attention and multi-head attention from memory for an ML interview

USE CASE 2

Review compact PyTorch reference code for RoPE, RMSNorm, and SwiGLU

USE CASE 3

Study DPO, PPO, and GRPO loss functions one chapter at a time

USE CASE 4

Re-derive LoRA and MoE from short worked examples before an interview

Tech stack

PyTorchPythonMarkdown

Getting it running

Difficulty · easy Time to first run · 5min

No install needed, but the README and all chapter text are in Chinese.

In plain English

LLM-Whiteboard is a study guide aimed at people preparing for technical interviews at machine-learning teams in China, where candidates are often asked to write key pieces of a large language model on a whiteboard from memory. The README, written in Chinese, calls this practice live coding, and says the repository collects short PyTorch implementations of the parts that come up most often in those interviews. The author notes that the code is meant for readability and recall rather than production use, and assumes the reader already has some background in how large language models work. The material is provided in two forms. There is one consolidated file, llm_pytorch_live_coding.md, with a matching PDF version that you can read straight through. There is also a chapters folder that breaks the same content into smaller files so you can practice one topic at a time. The README states that both forms have the same content. The table of contents is organized into six parts. The first two cover variations of the attention mechanism, grouped by how queries, keys, and values are sourced (self-attention and cross-attention) and by how multiple heads are arranged (MHA, MHA with a KV cache, MQA, GQA, and MLA). The third part covers transformer building blocks: LayerNorm, RMSNorm, rotary position embeddings called RoPE, the SwiGLU activation, a full transformer, and a transformer with a KV cache. The fourth part covers loss functions used in training and alignment, including cross-entropy for supervised fine-tuning and the DPO, PPO, and GRPO objectives used in preference learning. The fifth covers decoding strategies a model uses to pick the next token: greedy, beam search, and temperature, top-k, and top-p sampling. The sixth part contains two extra modules, LoRA for low-rank fine-tuning and MoE for mixture of experts. There is no listed license, no installation steps, and no automated tests. Each chapter file is a short walk-through of one idea with a PyTorch snippet next to the relevant formulas.

Copy-paste prompts

Prompt 1
Read llm_pytorch_live_coding.md and quiz me chapter by chapter on whiteboard implementations
Prompt 2
Compare the MHA, MQA, GQA, and MLA snippets in LLM-Whiteboard and explain when each is used
Prompt 3
Translate the Chinese README of LLM-Whiteboard into English and summarize the six parts
Prompt 4
Use the LLM-Whiteboard KV-cache transformer chapter to explain the memory cost of long contexts
Prompt 5
Generate flashcards from the LoRA and MoE chapters of LLM-Whiteboard
Open on GitHub → Explain another repo

Generated 2026-05-22 · Model: sonnet-4-6 · Verify against the repo before relying on details.