lllyasviel/controlnet

★ 33,858PythonAudience · researcherComplexity · 4/5Setup · hard

Mindmap

mindmap
  root((ControlNet))
    What it does
      Visual image control
      Pose conditioning
      Edge and depth guidance
    How it works
      Locked base model
      Trainable copy
      Zero convolution layers
    Condition types
      Body pose
      Sketch edges
      Depth maps
      Scribbles
    Tech
      Python
      Stable Diffusion
      Gradio
      PyTorch

mindmap root((ControlNet)) What it does Visual image control Pose conditioning Edge and depth guidance How it works Locked base model Trainable copy Zero convolution layers Condition types Body pose Sketch edges Depth maps Scribbles Tech Python Stable Diffusion Gradio PyTorch

Click or tap to explore — scroll the page freely

Things people build with this

USE CASE 1

Generate a character illustration that exactly matches the body pose from a reference photo.

USE CASE 2

Turn a rough pencil sketch into a polished AI-generated image that preserves the sketch's composition and layout.

USE CASE 3

Re-render a scene in a different art style while keeping the depth structure of the original photo intact.

USE CASE 4

Produce consistent product placement across multiple AI-generated images using a depth map as a template.

Tech stack

PythonPyTorchStable DiffusionGradio

Getting it running

Difficulty · hard Time to first run · 1h+

Requires a GPU with at least 4GB VRAM and a Stable Diffusion 1.5 model download of approximately 2GB.

License information is not mentioned in the explanation.

In plain English

ControlNet solves a real creative problem: when you use AI image generators like Stable Diffusion, you can describe what you want in text, but you have very little control over the exact composition, pose, or structure of the result. ControlNet adds a way to guide image generation using visual signals, things like edge outlines, human body poses, depth maps, or hand-drawn scribbles, so the AI generates images that follow your provided structure, not just your words. The way it works is clever: it makes a copy of part of the image-generation neural network. One copy is "locked" and stays unchanged (preserving the original model's capability), while the other copy is "trainable" and learns to respond to your extra visual condition. These two copies are connected through special "zero convolution" layers, small 1x1 filters initialized to output nothing at the start, which means the system begins training without causing any disruption to the original model. As training continues, these connectors gradually learn to inject the visual condition into the generation process. You would use ControlNet when you want to generate an image that matches a specific pose, follows the edges of a sketch you drew, mirrors the depth structure of a reference photo, or replicates the layout from a line drawing. Instead of prompting and hoping, you get reproducible control. The stack is Python, built on top of Stable Diffusion 1.5 (the popular open-source image model), and uses Gradio to provide interactive browser-based demos. Supporting tools include OpenPose for body detection, Midas for depth, and various edge-detection algorithms. Training can run on consumer GPUs with limited memory.

Copy-paste prompts

Prompt 1

Load ControlNet with an OpenPose condition and generate an image of a person in the exact pose shown in this reference photo.

Prompt 2

How do I use ControlNet's Canny edge detection model to generate an image that follows the outlines of my sketch?

Prompt 3

Set up the ControlNet Gradio demo locally so I can test pose, depth, and scribble conditions interactively in a browser.

Prompt 4

I want to generate product photos where the item always appears in the same position. How do I use a depth map condition with ControlNet?

Prompt 5

What consumer GPU specs do I need to run ControlNet locally, and can it run on a laptop with 8GB VRAM?

Open on GitHub → Explain another repo

← lllyasviel on gitmyhub — every repo by this author, as a profile.

Verify against the repo before relying on details.