cumulo-autumn/streamdiffusion

★ 10,717PythonAudience · developerComplexity · 4/5Setup · hard

Mindmap

mindmap
  root((streamdiffusion))
    What it does
      Real-time image gen
      100 fps on GPU
    How it works
      Stream processing
      Frame batching
      Reused computation
    Demos
      Text-to-image live
      Webcam style transfer
    Requirements
      CUDA Nvidia GPU
      Python 3.10
      PyTorch
    Compatible with
      Stable Diffusion
      HuggingFace Diffusers

mindmap root((streamdiffusion)) What it does Real-time image gen 100 fps on GPU How it works Stream processing Frame batching Reused computation Demos Text-to-image live Webcam style transfer Requirements CUDA Nvidia GPU Python 3.10 PyTorch Compatible with Stable Diffusion HuggingFace Diffusers

Click or tap to explore — scroll the page freely

Things people build with this

USE CASE 1

Build an interactive art tool that generates AI images in real time as a user types or edits a text prompt.

USE CASE 2

Apply live AI visual styles or transformations to a webcam or screen capture feed.

USE CASE 3

Integrate real-time image generation into a live streaming or video effects application.

Tech stack

PythonPyTorchCUDATensorRTDockerStable Diffusion

Getting it running

Difficulty · hard Time to first run · 1h+

Requires a CUDA-capable Nvidia GPU, optional TensorRT plugin adds significant setup complexity for maximum speed.

In plain English

StreamDiffusion is a Python library that makes AI image generation fast enough to work in real time. Standard diffusion-based image generators (the kind behind tools like Stable Diffusion) take a fraction of a second to a few seconds per image, which is too slow for interactive applications. StreamDiffusion restructures the generation process so images can be produced at dozens of frames per second on a consumer GPU. The core idea is to treat image generation as a continuous stream rather than a series of one-off requests. Several technical approaches contribute to the speed: batching frames together, reusing intermediate computation across frames, and applying filters to skip redundant work when consecutive frames are similar. On an Nvidia RTX 4090, the library can generate over 100 frames per second from a text prompt and around 94 frames per second when transforming an input image. Two interactive demos come with the project. One lets you type a text description and watch the AI generate matching images in real time as you type or tweak the prompt. The other uses a live webcam feed or screen capture and continuously applies an AI style or transformation to whatever the camera sees, updating visually as you move. Installation requires a CUDA-capable Nvidia GPU, Python 3.10, PyTorch, and an optional TensorRT plugin for maximum speed. Docker support is also included. The library wraps around the Hugging Face Diffusers ecosystem, so any model that works with the standard Stable Diffusion pipeline can be plugged in. The project comes from a research team and is accompanied by a paper on arXiv. It is designed for developers building interactive creative tools, live video effects, or any application where real-time image generation is needed.

Copy-paste prompts

Prompt 1

I want to build a real-time AI image generator using StreamDiffusion. Show me the minimal Python code to start generating frames from a text prompt.

Prompt 2

How do I set up StreamDiffusion with TensorRT for maximum frame rate on an Nvidia GPU?

Prompt 3

Using StreamDiffusion, how do I apply a continuous AI style transformation to a live webcam feed?

Prompt 4

What Stable Diffusion models are compatible with StreamDiffusion, and how do I swap between them?

Open on GitHub → Explain another repo

← cumulo-autumn on gitmyhub — every repo by this author, as a profile.

Verify against the repo before relying on details.