explaingit

apple/ml-velox

14Audience · researcherComplexity · 4/5ActiveSetup · hard

TLDR

Apple research project page for Velox, a CVPR 2026 paper on learning compact tokens for 4D objects from spacetime point clouds. README links to arXiv and the project site.

Mindmap

mindmap
  root((ml-velox))
    Inputs
      Spacetime color point cloud
    Outputs
      Dynamic tokens
      4D surface
      3D Gaussians
    Use Cases
      Video to 4D generation
      3D tracking over time
      Image to 4D cloth simulation
    Tech Stack
      Python
      PyTorch
      3DGaussians

Things people build with this

USE CASE 1

Read the paper and cite Velox for 4D representation learning work

USE CASE 2

Watch the video results on the project site to evaluate the method

USE CASE 3

Reference the encoder plus dual-decoder design for your own 4D token research

Tech stack

PythonPyTorch3DGaussians

Getting it running

Difficulty · hard Time to first run · 1day+

README is a paper landing page with no install steps or code overview, so running the method requires waiting for or assembling the codebase yourself.

Released under a custom Apple license file in the repo, with a separate license for the sample data. Check LICENSE before use.

In plain English

This repository is the official companion page from Apple for a research paper called Velox: Learning Representations of 4D Geometry and Appearance, which is being presented at CVPR 2026. CVPR is the main yearly conference for computer vision research. The README is short and acts mostly as a pointer to the paper on arXiv and the project website hosted on Apple's GitHub pages. The research deals with 4D objects. In this context, 4D means a 3D object plus time, so think of a moving, deforming shape rather than a still statue. The authors describe a method that takes a messy moving point cloud, which is just a swarm of colored 3D points captured over time, and learns to compress it into a small set of tokens that still carry the shape and the look of the object. The team frames three goals for these tokens: they should be descriptive enough to recreate geometry and color, compact enough to be efficient for later use, and easy to build from sparse input. The README's abstract explains the training setup at a high level. A single encoder turns the spacetime point cloud into the tokens. Two decoders then read those tokens during training. One decoder reconstructs the time varying surface of the object, which teaches the tokens to capture shape. The other decoder produces 3D Gaussians, a popular way to render scenes today, which teaches the tokens to capture appearance and color. To show the tokens are useful in practice, the paper applies them to three downstream tasks. Those are turning a video into a 4D model, tracking a 3D scene over time, and an image to 4D pipeline used for cloth simulation. The README says the authors see strong results on all three and points readers to the project website for video examples. The rest of the README is housekeeping. It lists the repository license and a separate license for the sample data, credits other open source projects used in the codebase, and provides the BibTeX citation. There are no install instructions, no code overview, and no usage examples in the README itself.

Copy-paste prompts

Prompt 1
Summarize the Velox CVPR 2026 paper architecture in terms of encoder, 4D surface decoder, and Gaussian decoder
Prompt 2
Compare Velox tokens to 4D Gaussian Splatting representations for video-to-4D generation
Prompt 3
Outline how to reimplement the Velox encoder for spacetime color point clouds in PyTorch
Prompt 4
Explain how the cloth simulation downstream task uses image-to-4D from Velox tokens
Open on GitHub → Explain another repo

Generated 2026-05-22 · Model: sonnet-4-6 · Verify against the repo before relying on details.