Run the included drawer or fridge demo to reconstruct how objects move when they temporarily leave the frame.
Use the codebase as a baseline for CVPR 2026 comparisons in 4D scene reconstruction.
Apply the Gauss-Newton primitive fitting pipeline to a custom video dataset.
Requires an NVIDIA GPU with CUDA and model checkpoints downloaded via the included install script.
4D Primitive-Mache is a research codebase from a paper accepted as an oral presentation at CVPR 2026, a major academic conference in computer vision. The project addresses a problem called persistent 4D scene reconstruction, which means building a model of a physical scene that tracks how objects move and change over time across multiple video frames, not just capturing a single static snapshot. The core idea is to represent scenes using geometric primitives: simple shapes like ellipsoids or similar building blocks that can be positioned, oriented, and deformed. The paper proposes a method for fitting these primitives to video footage and tracking them persistently over time, even when objects temporarily disappear from view (for example, when a drawer closes and hides its contents). This property, called object permanence in the demo configurations, is what the word "persistent" in the title refers to. The codebase is organized into three main parts. The frontend handles geometry estimation and object segmentation, using two external models called Pi3 and SAM 2. The core module handles mathematical optimization: specifically a Gauss-Newton solver that fits the primitives to the observed data. The object mapper handles motion tracking and assembles what the authors call 4D replay, which is a time-extended representation of the scene that can be replayed or inspected after reconstruction. Running the system requires an NVIDIA graphics card, CUDA, and the PyTorch deep learning library. An install script sets up the environment, downloads model checkpoints, and configures paths automatically. Demo configurations for a robot arm dataset and two object-permanence scenarios (a drawer and a fridge) are included. The README is technical and assumes familiarity with computer vision research. There is no graphical interface, results are visualized using an external tool called Rerun.
← makezur on gitmyhub — every repo by this author, as a profile.
Verify against the repo before relying on details.