Train a working language model on your own GPU to understand how ChatGPT-like systems actually work internally.
Fine-tune a small model using LoRA or RLHF techniques to customize its behavior for specific tasks.
Deploy a trained model locally via an OpenAI-compatible API and integrate it into your own applications.
Study the complete training pipeline from raw text to inference without relying on high-level frameworks.
Requires GPU with sufficient VRAM, PyTorch/CUDA setup, and 2+ hours of training time per example.
MiniMind is an educational open-source project that teaches you how to build a small but fully functional large language model (LLM) from scratch. An LLM is the kind of AI that powers tools like ChatGPT, it takes in text and generates coherent responses. The project's goal is to make this technology accessible: instead of the hundreds of billions of parameters used by industry models, MiniMind trains a model with only 64 million parameters, small enough to run on a single consumer GPU in about two hours. The project covers the entire training pipeline end-to-end, starting from raw text data. It includes data cleaning, tokenizer training (teaching the model to split text into tokens, which are the units it processes), pretraining (learning general language patterns from large text corpora), and supervised fine-tuning (teaching it to follow instructions). Beyond basic training, it also implements more advanced techniques from scratch: LoRA (a fine-tuning method that reduces memory usage), RLHF (reinforcement learning from human feedback, used to align AI responses with human preferences), and reasoning chain training. All these algorithms are written directly in PyTorch without relying on high-level abstraction libraries, so readers can see exactly how each component works. You would use this repository if you are a student, researcher, or developer who wants to deeply understand how LLMs work internally, not just how to use them, but how they are actually built and trained. It serves as both a working codebase and a learning tutorial. The trained models are small enough to be deployed locally and can be served via an OpenAI-compatible API endpoint, making them usable with existing chat interfaces. A simple Streamlit web interface is also included for testing. The tech stack is Python and PyTorch, with optional support for distributed training across multiple GPUs using DDP and DeepSpeed.
Generated 2026-05-18 · Model: sonnet-4-6 · Verify against the repo before relying on details.