explaingit

timothybrooks/instruct-pix2pix

6,882PythonAudience · researcherComplexity · 4/5Setup · hard

TLDR

An AI image editor from UC Berkeley that applies plain-English instructions to photos, say 'add snow' or 'turn him into a cyborg' and it edits the image using Stable Diffusion.

Mindmap

mindmap
  root((repo))
    Model
      Stable Diffusion base
      Fine-tuned editing
      Text instructions
    Interfaces
      Command line
      Gradio web app
      Parameter tuning
    Training data
      454k examples
      GPT-3 captions
      Prompt-to-Prompt pairs
    Requirements
      GPU 18GB RAM
      Python setup
    Research
      UC Berkeley paper
      CLIP filtering
Click or tap to explore — scroll the page freely

Code map

Detail Auto

An interactive map of this repo's files and how they connect — its source is parsed live in your browser. Click Visualize to build it.

filefunction / class

Things people build with this

USE CASE 1

Edit photos with plain-English commands like 'make it look vintage' or 'add a sunset' without manual masking.

USE CASE 2

Run an interactive browser-based image editor using the included Gradio web app.

USE CASE 3

Download the 454,000-example training dataset of paired before-and-after edits to fine-tune your own model.

Tech stack

PythonPyTorchStable DiffusionGradio

Getting it running

Difficulty · hard Time to first run · 1h+

Requires a GPU with more than 18 GB of VRAM, there is no CPU or low-VRAM fallback.

In plain English

InstructPix2Pix is a research project from UC Berkeley that lets you edit images by describing the change you want in plain English. You provide an input image and a text instruction like "turn him into a cyborg" or "add snow," and the model produces a new version of the image with that edit applied. It was published as an academic paper and this repository contains the code to run it and the data used to train it. The model is built on top of Stable Diffusion, a popular open-source image generation model. Fine-tuning Stable Diffusion on paired image examples, before and after an edit, taught the model to follow editing instructions while preserving the content of the original image that should remain unchanged. Running the model requires a GPU with more than 18 gigabytes of memory. You can edit a single image from the command line by passing in the image file and your instruction as text. There is also an interactive web application powered by Gradio that lets you upload images and type instructions in a browser interface. Parameters like the number of diffusion steps and guidance strength can be adjusted to tune the quality and faithfulness of the result. The training dataset consists of around 454,000 examples, each containing an original image, an editing instruction, and the edited result. The dataset was built in two stages: first, GPT-3 was fine-tuned to generate captions and matching edit instructions, and then Stable Diffusion combined with a technique called Prompt-to-Prompt converted those paired text captions into paired images. Two versions of the dataset are available for download: a full random-sample version and a higher-quality filtered version selected using CLIP scoring.

Copy-paste prompts

Prompt 1
Run instruct-pix2pix on a portrait photo to add a fantasy forest background, what command do I use?
Prompt 2
Set up the Gradio web interface for instruct-pix2pix so I can upload images and type editing instructions in a browser.
Prompt 3
How do I tune the guidance strength and diffusion steps in instruct-pix2pix to make edits more or less faithful to the original?
Prompt 4
Download the filtered high-quality instruct-pix2pix dataset and show me the format of each training example.
Open on GitHub → Explain another repo

← timothybrooks on gitmyhub — every repo by this author, as a profile.

Verify against the repo before relying on details.