explaingit

phillipi/pix2pix

10,637LuaAudience · researcherComplexity · 4/5Setup · hard

TLDR

A 2017 research implementation of image-to-image translation using conditional GANs, train a model to convert images from one visual style into another, like maps to satellite photos or sketches to shoes.

Mindmap

mindmap
  root((pix2pix))
    What It Does
      Image translation
      Style conversion
      Paired training
    Tech Stack
      Lua
      Torch
      CUDA
    Example Datasets
      Facades
      Street scenes
      Shoe sketches
      Day to night
    Use Cases
      Map to satellite
      Sketch to photo
      Style transfer
    Audience
      Researchers
      ML practitioners
Click or tap to explore — scroll the page freely

Code map

Detail Auto

An interactive map of this repo's files and how they connect — its source is parsed live in your browser. Click Visualize to build it.

filefunction / class

Things people build with this

USE CASE 1

Train a model to convert architectural sketch labels into realistic building facade photos using the included facades dataset.

USE CASE 2

Generate nighttime scenes from daytime outdoor photographs by training on matched day/night image pairs.

USE CASE 3

Turn pencil edge sketches of shoes or handbags into photorealistic product images.

USE CASE 4

Test pre-trained pix2pix models on included example datasets without writing any training code.

Tech stack

LuaTorchCUDA

Getting it running

Difficulty · hard Time to first run · 1h+

CUDA-only, requires an NVIDIA GPU, CPU training is not practical for any real dataset.

In plain English

Pix2pix is a research implementation of image-to-image translation, a technique that trains a computer to convert images from one visual style into another. You give it pairs of matched images, for example a map on one side and the corresponding satellite photo on the other, and the model learns to generate the second type of image from the first. The paper describing this work was published at CVPR 2017. The approach uses a type of AI architecture called a conditional generative adversarial network. Two neural networks train against each other: one generates images, the other tries to detect whether the generated images look realistic. Over many training steps, the generator improves until its output is difficult to distinguish from real images in the target style. The repository includes several example datasets you can download and train on directly. These include building facade labels mapped to facade photos, city street annotations mapped to street scene photographs, pencil-edge sketches mapped to shoe or handbag photos, and daytime outdoor scenes mapped to their nighttime equivalents. Pre-trained model weights for these pairs are also available so you can test the results without training from scratch. This version is written in Lua using the Torch deep learning framework and requires an NVIDIA GPU with CUDA to train at a reasonable speed. The README notes that a newer and more actively maintained Python implementation exists in a companion repository for anyone who prefers that setup. To use it, you install Torch and two packages, download a dataset, run the training script pointing it at your data folder, and then run the test script to generate translated images. The output is saved as image files and an HTML page for viewing results. Training a basic example like the facades dataset takes roughly two hours on a capable GPU.

Copy-paste prompts

Prompt 1
I have paired training images with edge sketches on one side and shoes on the other. How do I use pix2pix to train a model on these pairs and then run inference to generate shoe photos from new sketches?
Prompt 2
Walk me through installing Torch and the required packages to run pix2pix on my NVIDIA GPU on Linux.
Prompt 3
I want to convert daytime outdoor photos to nighttime versions with pix2pix. What dataset do I use and how do I run the training script?
Prompt 4
How do I run inference using the pre-trained pix2pix facades model to convert building label images into facade photos?
Prompt 5
What is a conditional GAN and how does pix2pix use one to translate images between two visual styles?
Open on GitHub → Explain another repo

← phillipi on gitmyhub — every repo by this author, as a profile.

Verify against the repo before relying on details.