Train a deep learning model across multiple GPUs without rewriting the training loop.
Run reproducible machine learning experiments with automatic checkpoint saving and metric logging.
Fine-tune an image classifier with 16-bit mixed-precision arithmetic for faster training using a one-line flag.
Stream large datasets from cloud storage during training using the Lightning Data package.
Multi-GPU training requires a CUDA-capable GPU, CPU-only training works but is slow for large models.
PyTorch Lightning is a Python framework that sits on top of PyTorch, the popular deep learning library, and removes the repetitive engineering boilerplate from machine learning projects. The problem it solves is that raw PyTorch training loops require developers to write the same scaffolding code over and over: moving data between devices, tracking metrics, saving checkpoints, distributing work across multiple GPUs, and handling mixed-precision arithmetic. Lightning organizes all of that into a standard structure so researchers can focus on the actual model science instead of the infrastructure. The core idea is to separate "the science" from "the engineering." You define your model inside a class called a LightningModule, which has clear slots for the training step, validation step, and optimizer configuration. You then hand that module to a Trainer object and tell it how many GPUs to use, whether to use 16-bit floating-point for speed, and which experiment-tracking logger to connect. The Trainer handles the rest, the training loop, gradient updates, logging, checkpointing, and multi-GPU distribution, all with no code changes when you scale from one machine to thousands. The library ships four packages: PyTorch Lightning for model training, Fabric for developers who want finer-grained manual control over distributed training, Lightning Data for streaming large datasets from cloud storage, and Lightning Apps for building end-to-end AI workflows. You might use it when pre-training a large language model across a GPU cluster, fine-tuning an image classifier, or running reproducible experiments that need consistent logging and checkpoint management. The tech stack is Python and PyTorch. It installs via pip and supports CPU, GPU, and TPU accelerators.
← lightning-ai on gitmyhub — every repo by this author, as a profile.
Verify against the repo before relying on details.