Practice writing self-attention and multi-head attention from memory for an ML interview
Review compact PyTorch reference code for RoPE, RMSNorm, and SwiGLU
Study DPO, PPO, and GRPO loss functions one chapter at a time
Re-derive LoRA and MoE from short worked examples before an interview
No install needed, but the README and all chapter text are in Chinese.
LLM-Whiteboard is a study guide aimed at people preparing for technical interviews at machine-learning teams in China, where candidates are often asked to write key pieces of a large language model on a whiteboard from memory. The README, written in Chinese, calls this practice live coding, and says the repository collects short PyTorch implementations of the parts that come up most often in those interviews. The author notes that the code is meant for readability and recall rather than production use, and assumes the reader already has some background in how large language models work. The material is provided in two forms. There is one consolidated file, llm_pytorch_live_coding.md, with a matching PDF version that you can read straight through. There is also a chapters folder that breaks the same content into smaller files so you can practice one topic at a time. The README states that both forms have the same content. The table of contents is organized into six parts. The first two cover variations of the attention mechanism, grouped by how queries, keys, and values are sourced (self-attention and cross-attention) and by how multiple heads are arranged (MHA, MHA with a KV cache, MQA, GQA, and MLA). The third part covers transformer building blocks: LayerNorm, RMSNorm, rotary position embeddings called RoPE, the SwiGLU activation, a full transformer, and a transformer with a KV cache. The fourth part covers loss functions used in training and alignment, including cross-entropy for supervised fine-tuning and the DPO, PPO, and GRPO objectives used in preference learning. The fifth covers decoding strategies a model uses to pick the next token: greedy, beam search, and temperature, top-k, and top-p sampling. The sixth part contains two extra modules, LoRA for low-rank fine-tuning and MoE for mixture of experts. There is no listed license, no installation steps, and no automated tests. Each chapter file is a short walk-through of one idea with a PyTorch snippet next to the relevant formulas.
Generated 2026-05-22 · Model: sonnet-4-6 · Verify against the repo before relying on details.