explaingit

cp-cp/liveedit

Analysis updated 2026-05-18

59PythonAudience · researcherComplexity · 5/5Setup · hard

TLDR

An ECCV 2026 research codebase that edits video in real time using a diffusion model, processing footage chunk-by-chunk from a text instruction while keeping unchanged regions intact.

Mindmap

mindmap
  root((LiveEdit))
    What it does
      Streaming video editing
      Chunk-by-chunk inference
      Text-driven changes
    Key Ideas
      Source preservation
      Mask Cache reuse
      Three-stage training
    Built On
      Wan2.1 base model
      Self-Forcing codebase
    Setup
      Linux and NVIDIA GPU
      HuggingFace weights
      Conda environment
Click or tap to explore — scroll the page freely

Code map

Detail Auto

An interactive map of this repo's files and how they connect — its source is parsed live in your browser. Click Visualize to build it.

filefunction / class

What do people build with it?

USE CASE 1

Run text-driven video editing on a short clip using a pre-trained LiveEdit checkpoint, providing a source video and text instruction in a JSON file

USE CASE 2

Reproduce the streaming video editing results from the ECCV 2026 LiveEdit paper using the official training and inference scripts

USE CASE 3

Experiment with the AR-Oriented Mask Cache for efficient chunk-level computation reuse and visualize which regions are being recalculated

What is it built with?

PythonPyTorchCUDAWan2.1Flash AttentionHugging Face

How does it compare?

cp-cp/liveeditzhw040803-glitch/uav-gps-dqn-detection0xh4ku/manga-pdf-to-epub
Stars595960
LanguagePythonPythonPython
Setup difficultyhardmoderatemoderate
Complexity5/53/52/5
Audienceresearcherresearchergeneral

Figures from each repo's GitHub metadata at analysis time.

How do you get it running?

Difficulty · hard Time to first run · 1h+

Requires NVIDIA GPU with CUDA, must download both Wan2.1 base model and LiveEdit checkpoint from Hugging Face before running inference.

In plain English

LiveEdit is an academic research project (accepted to ECCV 2026) from Tsinghua University and HKUST that approaches video editing differently from most AI video tools. Where typical AI video editing requires the entire clip to be processed at once before showing any results, LiveEdit edits video in small, overlapping chunks processed one after another, similar to how a livestream works. This allows it to begin showing edited output much sooner than batch approaches. The workflow takes two inputs: a source video and a text instruction describing what to change, such as "change the red currants to deep black grapes." The model keeps untouched parts of the frame (backgrounds, people, objects not mentioned in the instruction) as close to the original as possible while applying the transformation only to relevant regions. A mask cache optimization skips recalculation for regions that have not changed between chunks, reducing the computation required per chunk. LiveEdit is built on top of an existing video generation model called Wan2.1. The training procedure has three stages: first it teaches the model to edit video well in the standard offline whole-clip setting, then it adapts it to process chunks sequentially, then it applies a distillation step to compress the number of denoising steps required per chunk, which is what produces real-time-oriented performance. Running inference requires downloading the Wan2.1 base model weights and the LiveEdit checkpoint from Hugging Face, writing a small JSON file specifying your source video and instruction, then running a shell script. Training requires multiple NVIDIA GPUs and additional setup for dataset paths. This project is primarily for AI researchers studying video editing, diffusion models, or streaming inference. General users would need significant technical setup and GPU hardware to run it.

Copy-paste prompts

Prompt 1
Set up LiveEdit for inference: create the conda env, download the Wan2.1 base model and LiveEdit checkpoint from HuggingFace, and run the default inference script on the test video
Prompt 2
I want to edit a video to change a specific object. Show me the exact JSON format for the data_path input file and the inference command with correct arguments
Prompt 3
Explain the three-stage LiveEdit training pipeline: what does each stage do, what scripts run each stage, and what checkpoints does each produce?
Prompt 4
How does the AR-Oriented Mask Cache work in LiveEdit and how do I enable --save_mask to visualize which regions are being fully recomputed vs reused?

Frequently asked questions

What is liveedit?

An ECCV 2026 research codebase that edits video in real time using a diffusion model, processing footage chunk-by-chunk from a text instruction while keeping unchanged regions intact.

What language is liveedit written in?

Mainly Python. The stack also includes Python, PyTorch, CUDA.

How hard is liveedit to set up?

Setup difficulty is rated hard, with roughly 1h+ to a first successful run.

Who is liveedit for?

Mainly researcher.

Open on GitHub → Explain another repo

This repo across BitVibe Labs

Scan in gitsafehub Deploy in gitdeployhub cp-cp on gitmyhub

Verify against the repo before relying on details.