huggingface/accelerate

★ 9,704PythonAudience · researcherComplexity · 4/5Setup · hard

Mindmap

mindmap
  root((accelerate))
    What It Does
      Distributed training
      Hardware abstraction
      Minimal code changes
    Hardware Support
      Single GPU
      Multi-GPU cluster
      TPUs
    Key Features
      Mixed precision fp16 bf16
      DeepSpeed integration
      FSDP support
    Workflow
      accelerate config
      accelerate launch
    Used By
      Transformers library
      Stable Diffusion UI

mindmap root((accelerate)) What It Does Distributed training Hardware abstraction Minimal code changes Hardware Support Single GPU Multi-GPU cluster TPUs Key Features Mixed precision fp16 bf16 DeepSpeed integration FSDP support Workflow accelerate config accelerate launch Used By Transformers library Stable Diffusion UI

Click or tap to explore — scroll the page freely

Things people build with this

USE CASE 1

Run the same PyTorch training script on a single GPU, multiple GPUs, or a multi-machine cluster without rewriting the code.

USE CASE 2

Speed up model training using mixed precision (fp16 or bf16) by adding a single configuration flag.

USE CASE 3

Train very large AI models that don't fit in one GPU's memory using DeepSpeed or PyTorch FSDP to split the model across GPUs.

USE CASE 4

Add distributed training to an existing PyTorch training loop by wrapping the model, optimizer, and data loader, about five lines of change.

Tech stack

PythonPyTorchDeepSpeedFSDP

Getting it running

Difficulty · hard Time to first run · 1h+

Requires PyTorch and GPU hardware, multi-GPU or TPU setups need additional system-level drivers and environment configuration.

In plain English

Accelerate is a Python library from Hugging Face that lets you run the same AI model training code across many different hardware setups without rewriting it. If you have written a training loop in PyTorch (the code that repeatedly feeds data to a model and updates its weights), Accelerate handles the extra complexity of spreading that work across multiple GPUs, multiple machines, or specialized chips called TPUs. You add roughly five lines to your existing code, and the same script then works on a single laptop CPU, a single GPU, or a cluster of machines. The core idea is that all the tedious setup code for distributed training, like deciding which machine should print logs or how to combine results from multiple GPUs, gets handled behind the scenes. You keep your training loop exactly as you wrote it. Accelerate wraps the model, the optimizer, and the data loader, then takes care of coordination automatically. It also includes a command-line tool called accelerate config that asks you a few questions about your hardware and saves a configuration file. After that, running accelerate launch my_script.py starts your training with the right settings applied. You do not need to memorize the flags for PyTorch's own distributed launcher. Accelerate supports mixed precision training, which uses lower-precision numbers (like fp16 or bf16) to speed up computation and reduce memory use. It also works with two popular tools for training very large models: DeepSpeed and PyTorch FSDP. Both allow a single large model to be split across multiple GPUs when it would not otherwise fit in memory. This library is aimed at developers who want direct control over their training code and do not want a higher-level framework making decisions for them. Several other well-known projects, including Hugging Face Transformers and the Stable Diffusion web UI, use Accelerate internally as their training backend.

Copy-paste prompts

Prompt 1

I have a PyTorch training loop that runs on one GPU. Show me exactly which lines I need to add to make it run with Accelerate across 4 GPUs on one machine.

Prompt 2

Help me configure Accelerate for bf16 mixed precision training, what does 'accelerate config' ask me, and what does my final launch command look like?

Prompt 3

I want to train a model too large for a single GPU. Walk me through using Accelerate with DeepSpeed ZeRO Stage 3 to split it across 2 GPUs.

Prompt 4

My Accelerate training script works locally but I need to run it on a multi-node cluster. What config changes and launch command flags do I need?

Open on GitHub → Explain another repo

← huggingface on gitmyhub — every repo by this author, as a profile.

Verify against the repo before relying on details.