explaingit

compvis/latent-diffusion

14,025Jupyter NotebookAudience · researcherComplexity · 5/5Setup · hard

TLDR

The original 2021 research code for Latent Diffusion Models, the technique that became Stable Diffusion, for generating high-quality images from text prompts, inpainting, and super-resolution.

Mindmap

mindmap
  root((latent diffusion))
    What it does
      Text to image
      Image inpainting
      Super resolution
    How it works
      Compressed latent space
      Diffusion process
      Retrieval augmented
    Tech stack
      Python PyTorch
      CUDA GPU
      Jupyter notebooks
    Research context
      2021 paper
      LAION-400M data
      Stable Diffusion origin
Click or tap to explore — scroll the page freely

Code map

Detail Auto

An interactive map of this repo's files and how they connect — its source is parsed live in your browser. Click Visualize to build it.

filefunction / class

Things people build with this

USE CASE 1

Generate images from text prompts using the 1.45 billion parameter pre-trained model

USE CASE 2

Fill in masked regions of a photo using the inpainting model

USE CASE 3

Upscale low-resolution images to higher resolution using the super-resolution model

USE CASE 4

Run retrieval-augmented image generation by conditioning on visually similar images from a database

Tech stack

PythonPyTorchCUDAcondaJupyter Notebook

Getting it running

Difficulty · hard Time to first run · 1day+

Requires a CUDA-capable GPU, a conda environment, and separately downloaded multi-gigabyte model checkpoint files, no CPU fallback for practical use.

Check the repository for specific license terms.

In plain English

Latent Diffusion Models (LDM) is a research repository from 2021-2022 that introduced the core technique behind Stable Diffusion: generating high-resolution images by running the diffusion process in a compressed latent space rather than directly on pixels. By compressing images into a much smaller representation first, the model can produce detailed images far more efficiently than earlier pixel-space diffusion approaches. The repository contains the research code and pre-trained weights from the paper "High-Resolution Image Synthesis with Latent Diffusion Models" by researchers at Ludwig Maximilian University of Munich and Heidelberg University. It supports several tasks: text-to-image generation (type a prompt, get an image), class-conditional image synthesis (generate images of specific ImageNet categories), image inpainting (fill in masked regions of a photo), super-resolution, and image-to-image translation tasks. The largest pre-trained model available is 1.45 billion parameters, trained on the LAION-400M dataset, which is a large collection of image-text pairs scraped from the web. A web demo of this model was made available on Hugging Face Spaces. The repository also includes a variant called Retrieval-Augmented Diffusion Models (RDMs), which conditions image generation on visually similar images retrieved from a database such as OpenImages or ArtBench, in addition to a text prompt. Setup requires a conda environment and separately downloaded model checkpoint files. Several Python scripts handle different generation tasks, and Jupyter notebooks are included as examples. Sampling speed and image quality can be tuned through flags like ddim_steps and scale. The repository was published alongside the academic paper and includes a BibTeX citation for use in research. This code predates Stable Diffusion, which is a later refinement of the same underlying technique.

Copy-paste prompts

Prompt 1
I want to run the Latent Diffusion text-to-image model locally. Walk me through setting up the conda environment and downloading the checkpoint file.
Prompt 2
How do I use Latent Diffusion Models for image inpainting, replacing a masked area of a photo? Show me the Python script and the key flags to set.
Prompt 3
Explain latent diffusion in plain terms: how does running the diffusion process in a compressed space make image generation faster than pixel-level methods?
Prompt 4
How do I tune image quality versus speed in Latent Diffusion Models using the ddim_steps and scale flags? What values should I try first?
Open on GitHub → Explain another repo

← compvis on gitmyhub — every repo by this author, as a profile.

Verify against the repo before relying on details.