compvis/stable-diffusion

Analysis updated 2026-06-20

★ 72,976Jupyter NotebookAudience · researcherComplexity · 4/5LicenseSetup · hard

Mindmap

mindmap
  root((repo))
    What it Does
      Text to image
      Latent diffusion
      Research artifact
    How it Works
      CLIP text encoder
      Latent compression
      Noise refinement
    Tech Stack
      Python
      PyTorch
      CLIP
    Audience
      ML researchers
      Technical developers
    Use Cases
      Local image generation
      Model experimentation

mindmap root((repo)) What it Does Text to image Latent diffusion Research artifact How it Works CLIP text encoder Latent compression Noise refinement Tech Stack Python PyTorch CLIP Audience ML researchers Technical developers Use Cases Local image generation Model experimentation

Click or tap to explore — scroll the page freely

What do people build with it?

USE CASE 1

Run text-to-image generation locally on a GPU to produce images from written prompts for research or creative experiments.

USE CASE 2

Study how latent diffusion models work by reading and modifying the sampling and training code directly.

USE CASE 3

Experiment with the pretrained model weights to understand how text prompts influence image generation output.

What is it built with?

PythonPyTorchCLIPJupyter Notebook

How does it compare?

	compvis/stable-diffusion	openai/openai-cookbook	microsoft/ai-agents-for-beginners
Stars	72,976	73,284	60,670
Language	Jupyter Notebook	Jupyter Notebook	Jupyter Notebook
Setup difficulty	hard	easy	moderate
Complexity	4/5	2/5	2/5
Audience	researcher	developer	developer

Figures from each repo's GitHub metadata at analysis time.

How do you get it running?

Difficulty · hard Time to first run · 1day+

Requires a GPU with at least 10GB of VRAM and separately downloaded model weights from Hugging Face.

Commercial use is permitted, but the license includes responsible-use conditions that restrict certain harmful applications.

In plain English

This is the original research repository for Stable Diffusion, an AI model that generates images from text descriptions. You type a written prompt like "a photograph of an astronaut riding a horse" and the model produces a realistic or artistic image matching that description. The core problem it solves is turning natural language into visual output, which has uses in art, design, prototyping, and creative exploration. The model works using a technique called latent diffusion. Rather than working directly with full-size pixel images, it compresses images into a smaller mathematical representation called a latent space, then applies a diffusion process in that compressed space. Diffusion works by starting from random noise and gradually refining it, guided by a text encoder (specifically CLIP ViT-L/14) that translates your written prompt into numerical signals the model can follow. The result is decoded back into a 512x512 pixel image. This approach is more computationally efficient than operating on raw pixels, allowing the model to run on consumer GPUs with at least 10GB of video memory. You would use this repository if you are a researcher or technically experienced developer who wants to run text-to-image generation locally, experiment with the model weights, or study how latent diffusion models work. It is not a polished user-facing application, it is a research artifact with command-line scripts and Jupyter Notebooks. End users looking for a friendlier experience would typically use this model through a tool like Hugging Face Diffusers instead. The tech stack is Python, PyTorch, and CLIP, with the repository organized as Jupyter Notebooks and Python scripts. Model weights are distributed separately via Hugging Face under a license that permits commercial use but includes responsible-use conditions.

Copy-paste prompts

Prompt 1

Using compvis/stable-diffusion, write a Python script that loads the pretrained weights from Hugging Face and generates a 512x512 image from the prompt 'a sunset over the ocean, oil painting style'.

Prompt 2

How do I run the stable-diffusion sampling script from the command line? Give me the exact command with flags for a basic text-to-image generation.

Prompt 3

Walk me through the latent diffusion architecture in compvis/stable-diffusion, what does each key component do and how do they connect?

Prompt 4

How do I change the classifier-free guidance scale in stable-diffusion to make the output follow my text prompt more or less strictly?

Frequently asked questions

What is stable-diffusion?

The original research code for Stable Diffusion, an AI model that generates images from text prompts using latent diffusion, built for researchers and developers, not casual end users.

What language is stable-diffusion written in?

Mainly Jupyter Notebook. The stack also includes Python, PyTorch, CLIP.

What license does stable-diffusion use?

Commercial use is permitted, but the license includes responsible-use conditions that restrict certain harmful applications.

How hard is stable-diffusion to set up?

Setup difficulty is rated hard, with roughly 1day+ to a first successful run.

Who is stable-diffusion for?

Mainly researcher.

Open on GitHub → Explain another repo

This repo across BitVibe Labs

Scan in gitsafehub Deploy in gitdeployhub compvis on gitmyhub

Verify against the repo before relying on details.