zhengdian1/uni-edit

Analysis updated 2026-06-24

★ 23PythonAudience · researcherComplexity · 5/5LicenseSetup · hard

Mindmap

mindmap
  root((Uni-Edit))
    Inputs
      Source images
      Edit instructions
      Text prompts
    Outputs
      Generated images
      Edited images
      Image descriptions
    Use Cases
      Run unified image model
      Train on Uni-Edit-148k
      Reproduce paper results
    Tech Stack
      Python
      PyTorch
      flash-attention
      BAGEL

mindmap root((Uni-Edit)) Inputs Source images Edit instructions Text prompts Outputs Generated images Edited images Image descriptions Use Cases Run unified image model Train on Uni-Edit-148k Reproduce paper results Tech Stack Python PyTorch flash-attention BAGEL

Click or tap to explore — scroll the page freely

What do people build with it?

USE CASE 1

Run inference for image generation, understanding, or editing from one model

USE CASE 2

Reproduce the paper's results on the Uni-Edit-148k dataset

USE CASE 3

Train a unified vision model on instruction-based editing data

USE CASE 4

Extend the BAGEL or Janus-Pro backbone for your own editing tasks

What is it built with?

PythonPyTorchCUDAflash-attentionHugging Face

How does it compare?

	zhengdian1/uni-edit	aaravkashyap12/advise-project-approach	abu-rayhan-alif/django-saas-kit
Stars	23	23	23
Language	Python	Python	Python
Setup difficulty	hard	easy	moderate
Complexity	5/5	2/5	3/5
Audience	researcher	developer	developer

Figures from each repo's GitHub metadata at analysis time.

How do you get it running?

Difficulty · hard Time to first run · 1day+

Needs a GPU, 54 GB of system RAM to merge checkpoint shards, flash-attention, and a custom safetensors merge step before inference.

Apache 2.0 licensed, free to use, modify, and ship commercially with patent protection as long as the license notice stays in the code.

In plain English

Uni-Edit is the code release for a research paper from a group at the Chinese University of Hong Kong and collaborators. The project is about teaching one AI model to do three related jobs at once: understand images, generate new images from text, and edit existing images according to written instructions. Most existing systems train on a mix of separate datasets for each job and have to juggle conflicting goals across several training stages. The authors argue that intelligent image editing, where the instructions can be complex and contain reasoning, is general enough on its own to cover all three skills. So they train on just one task, with one dataset, in one stage. To make that possible they also built an automated pipeline that turns visual question-answering data into rich editing instructions, producing a dataset called Uni-Edit-148k that pairs each instruction with a high-quality edited image. The repository contains training scripts, inference scripts, and evaluation scripts. It is set up around an existing open model called BAGEL, with a separate Janus-Pro variant tested in the paper. The quick start clones the repo, builds a Python 3.10 conda environment, installs the requirements including flash-attention, and downloads the pretrained checkpoint from Hugging Face. Because the checkpoint uses a custom architecture, you cannot load it through the usual Hugging Face shortcut. You first merge the downloaded shards into one safetensors file using a provided script, which needs at least 54 gigabytes of system memory. After that, one command runs inference for generation, understanding, or editing. The code is released under the Apache 2.0 license.

Copy-paste prompts

Prompt 1

Walk me through setting up the Python 3.10 conda environment with flash-attention for Uni-Edit

Prompt 2

Download the BAGEL checkpoint from Hugging Face and merge the shards into one safetensors file

Prompt 3

Run inference for instruction-based editing on a sample image with a prompt

Prompt 4

Explain how the Uni-Edit-148k dataset was built from visual question-answering data

Prompt 5

Fine-tune Uni-Edit on my own dataset of edit instructions and target images

Frequently asked questions

What is uni-edit?

Research code for a single unified model that handles image understanding, text-to-image generation, and instruction-based editing, trained on one dataset in one stage.

What language is uni-edit written in?

Mainly Python. The stack also includes Python, PyTorch, CUDA.

What license does uni-edit use?

Apache 2.0 licensed, free to use, modify, and ship commercially with patent protection as long as the license notice stays in the code.

How hard is uni-edit to set up?

Setup difficulty is rated hard, with roughly 1day+ to a first successful run.

Who is uni-edit for?

Mainly researcher.

Open on GitHub → Explain another repo

This repo across BitVibe Labs

Verify against the repo before relying on details.