Learn to fine-tune a pre-trained language model on your own task by following step-by-step code examples.
Understand how prompt engineering and chain-of-thought reasoning improve model outputs through interactive experiments.
Explore safety vulnerabilities like jailbreaks and steganography to build more robust AI systems.
Build multimodal applications that process both text and images using practical code walkthroughs.
PyTorch and Transformers library installation can be slow; GPU optional but recommended for some notebooks.
dive-into-llms is a hands-on programming tutorial series for learning how large language models (LLMs) work in practice. Large language models are the AI systems behind tools like ChatGPT, they are trained on vast amounts of text and can generate, understand, and reason about language. The project bridges the gap between abstract theory and real implementation, targeting students and researchers who want to move from reading about AI to actually building with it. The tutorials are organized as Jupyter Notebooks, interactive documents that mix explanatory text with runnable code, which you can step through cell by cell. Each chapter covers a distinct topic: fine-tuning a pre-trained model on a specific task, writing effective prompts and using chain-of-thought reasoning, editing what a model "knows", teaching a model to do mathematical reasoning, embedding invisible watermarks into generated text, understanding jailbreak attacks that trick models into ignoring safety guidelines, multimodal models that handle both text and images, GUI agents that control software interfaces on your behalf, safety alignment using reinforcement learning from human feedback (RLHF), and steganography (hiding secret messages inside generated text). The project originated from university courses at Shanghai Jiao Tong University and is free and non-commercial. A companion curriculum co-developed with Huawei's Ascend platform covers the full LLM development pipeline in greater depth. A computer science student, AI researcher, or developer wanting practical experience working with language models would use this repository. The tech stack is Python running in Jupyter Notebooks, using standard deep learning libraries typical of the field.
Generated 2026-05-18 · Model: sonnet-4-6 · Verify against the repo before relying on details.