Run inference for image generation, understanding, or editing from one model
Reproduce the paper's results on the Uni-Edit-148k dataset
Train a unified vision model on instruction-based editing data
Extend the BAGEL or Janus-Pro backbone for your own editing tasks
Needs a GPU, 54 GB of system RAM to merge checkpoint shards, flash-attention, and a custom safetensors merge step before inference.
Uni-Edit is the code release for a research paper from a group at the Chinese University of Hong Kong and collaborators. The project is about teaching one AI model to do three related jobs at once: understand images, generate new images from text, and edit existing images according to written instructions. Most existing systems train on a mix of separate datasets for each job and have to juggle conflicting goals across several training stages. The authors argue that intelligent image editing, where the instructions can be complex and contain reasoning, is general enough on its own to cover all three skills. So they train on just one task, with one dataset, in one stage. To make that possible they also built an automated pipeline that turns visual question-answering data into rich editing instructions, producing a dataset called Uni-Edit-148k that pairs each instruction with a high-quality edited image. The repository contains training scripts, inference scripts, and evaluation scripts. It is set up around an existing open model called BAGEL, with a separate Janus-Pro variant tested in the paper. The quick start clones the repo, builds a Python 3.10 conda environment, installs the requirements including flash-attention, and downloads the pretrained checkpoint from Hugging Face. Because the checkpoint uses a custom architecture, you cannot load it through the usual Hugging Face shortcut. You first merge the downloaded shards into one safetensors file using a provided script, which needs at least 54 gigabytes of system memory. After that, one command runs inference for generation, understanding, or editing. The code is released under the Apache 2.0 license.
Generated 2026-05-22 · Model: sonnet-4-6 · Verify against the repo before relying on details.