compvis/latent-diffusion

★ 14,025Jupyter NotebookAudience · researcherComplexity · 5/5Setup · hard

Mindmap

mindmap
  root((latent diffusion))
    What it does
      Text to image
      Image inpainting
      Super resolution
    How it works
      Compressed latent space
      Diffusion process
      Retrieval augmented
    Tech stack
      Python PyTorch
      CUDA GPU
      Jupyter notebooks
    Research context
      2021 paper
      LAION-400M data
      Stable Diffusion origin

mindmap root((latent diffusion)) What it does Text to image Image inpainting Super resolution How it works Compressed latent space Diffusion process Retrieval augmented Tech stack Python PyTorch CUDA GPU Jupyter notebooks Research context 2021 paper LAION-400M data Stable Diffusion origin

Click or tap to explore — scroll the page freely

Things people build with this

USE CASE 1

Generate images from text prompts using the 1.45 billion parameter pre-trained model

USE CASE 2

Fill in masked regions of a photo using the inpainting model

USE CASE 3

Upscale low-resolution images to higher resolution using the super-resolution model

USE CASE 4

Run retrieval-augmented image generation by conditioning on visually similar images from a database

Tech stack

PythonPyTorchCUDAcondaJupyter Notebook

Getting it running

Difficulty · hard Time to first run · 1day+

Requires a CUDA-capable GPU, a conda environment, and separately downloaded multi-gigabyte model checkpoint files, no CPU fallback for practical use.

Check the repository for specific license terms.

In plain English

Latent Diffusion Models (LDM) is a research repository from 2021-2022 that introduced the core technique behind Stable Diffusion: generating high-resolution images by running the diffusion process in a compressed latent space rather than directly on pixels. By compressing images into a much smaller representation first, the model can produce detailed images far more efficiently than earlier pixel-space diffusion approaches. The repository contains the research code and pre-trained weights from the paper "High-Resolution Image Synthesis with Latent Diffusion Models" by researchers at Ludwig Maximilian University of Munich and Heidelberg University. It supports several tasks: text-to-image generation (type a prompt, get an image), class-conditional image synthesis (generate images of specific ImageNet categories), image inpainting (fill in masked regions of a photo), super-resolution, and image-to-image translation tasks. The largest pre-trained model available is 1.45 billion parameters, trained on the LAION-400M dataset, which is a large collection of image-text pairs scraped from the web. A web demo of this model was made available on Hugging Face Spaces. The repository also includes a variant called Retrieval-Augmented Diffusion Models (RDMs), which conditions image generation on visually similar images retrieved from a database such as OpenImages or ArtBench, in addition to a text prompt. Setup requires a conda environment and separately downloaded model checkpoint files. Several Python scripts handle different generation tasks, and Jupyter notebooks are included as examples. Sampling speed and image quality can be tuned through flags like ddim_steps and scale. The repository was published alongside the academic paper and includes a BibTeX citation for use in research. This code predates Stable Diffusion, which is a later refinement of the same underlying technique.

Copy-paste prompts

Prompt 1

I want to run the Latent Diffusion text-to-image model locally. Walk me through setting up the conda environment and downloading the checkpoint file.

Prompt 2

How do I use Latent Diffusion Models for image inpainting, replacing a masked area of a photo? Show me the Python script and the key flags to set.

Prompt 3

Explain latent diffusion in plain terms: how does running the diffusion process in a compressed space make image generation faster than pixel-level methods?

Prompt 4

How do I tune image quality versus speed in Latent Diffusion Models using the ddim_steps and scale flags? What values should I try first?

Open on GitHub → Explain another repo

← compvis on gitmyhub — every repo by this author, as a profile.

Verify against the repo before relying on details.