explaingit

fudancvl/occlusionformer

16PythonAudience · researcherComplexity · 4/5ActiveSetup · hard

TLDR

Inference code and demo for the ICML 2026 OcclusionFormer paper, which composes overlapping objects with explicit Z-order on top of FLUX.

Mindmap

mindmap
  root((OcclusionFormer))
    Inputs
      Layout JSON
      FLUX base model
      OcclusionFormer checkpoint
    Outputs
      Layered image
      Correct Z order
    Use Cases
      Layout to image research
      Occlusion ablations
      Demo via Streamlit
    Tech Stack
      Python
      PyTorch
      FLUX
      Streamlit
      Hugging Face

Things people build with this

USE CASE 1

Reproduce ICML 2026 results on layout-to-image with overlapping boxes

USE CASE 2

Compose a custom scene from a layout JSON with correct front-to-back order

USE CASE 3

Benchmark against the SA-Z dataset with amodal annotations

USE CASE 4

Try the Streamlit demo to explore Z-order conditioned generation

Tech stack

PythonPyTorchFLUXStreamlit

Getting it running

Difficulty · hard Time to first run · 1day+

Needs Python 3.11 conda env, the FLUX base model weights, and the OcclusionFormer checkpoint from Hugging Face.

In plain English

OcclusionFormer is a research project from Fudan University that accompanies a paper accepted at the ICML 2026 machine learning conference. It tackles a specific problem in image generation: when you tell an AI model to draw a scene by giving it bounding boxes for each object, and those boxes overlap, current methods often blend the textures together or get the order wrong, so an object that should be behind ends up looking like it is in front. The authors propose handling the front to back order, which they call Z-order, as an explicit step in the model. The approach has three pieces according to the README: each object instance is generated separately, then composed using a method borrowed from volume rendering that decides how much each layer shows through, and finally a queried alignment step keeps each object in its correct spatial position. Alongside the model the team is releasing a dataset called SA-Z, which adds occlusion order and amodal annotations (information about parts of objects hidden behind other objects) to layout data. This repository is the inference and demo package. It contains the model code, a Streamlit web demo, a command line inference script, example layout JSON files, and a requirements file. The model weights and the SA-Z dataset are hosted on Hugging Face, and the paper itself is on arXiv. To run it, the README walks through creating a Python 3.11 conda environment, installing the requirements, downloading the checkpoint, and either starting the Streamlit demo or calling the CLI script with paths to a base FLUX model, the OcclusionFormer checkpoint, and a layout JSON. One open task remains: organizing the amodal annotations on Hugging Face.

Copy-paste prompts

Prompt 1
Walk me through setting up a Python 3.11 conda env for OcclusionFormer and downloading the FLUX base plus checkpoint
Prompt 2
Show me how to write a layout JSON with overlapping bounding boxes and run the CLI inference script
Prompt 3
Help me run the Streamlit demo and understand the volume rendering composition step
Prompt 4
Explain how SA-Z amodal annotations are used during training versus inference in OcclusionFormer
Open on GitHub → Explain another repo

Generated 2026-05-22 · Model: sonnet-4-6 · Verify against the repo before relying on details.