mvp18/3dconsistency-metrics

★ 12PythonAudience · researcherComplexity · 4/5Setup · hard

Mindmap

mindmap
  root((3dconsistency-metrics))
    Research Problem
      Models accept noise as input
      Metrics mislead evaluation
      Disagree with human ratings
    Benchmark
      SysCON3D dataset
      Gaussian noise inputs
      Mixed unrelated scenes
      Clean baselines
    Methods
      Learned model metrics
      COLMAP classical metrics
      Human evaluation site
    Tech Stack
      Python
      CUDA GPU
      Gradio
      Hugging Face

mindmap root((3dconsistency-metrics)) Research Problem Models accept noise as input Metrics mislead evaluation Disagree with human ratings Benchmark SysCON3D dataset Gaussian noise inputs Mixed unrelated scenes Clean baselines Methods Learned model metrics COLMAP classical metrics Human evaluation site Tech Stack Python CUDA GPU Gradio Hugging Face

Click or tap to explore — scroll the page freely

Things people build with this

USE CASE 1

Test whether a 3D reconstruction model hallucinated structure from random noise using the SysCON3D benchmark.

USE CASE 2

Compare learned 3D metrics against classical COLMAP-based metrics to see which better matches human perception of scene consistency.

USE CASE 3

Run the interactive Gradio demo to upload your own images and compare how VGGT, MASt3R, DUSt3R, and Fast3R reconstruct them.

USE CASE 4

Score any folder of images for 3D consistency before using them as training or evaluation data for a computer vision model.

Tech stack

PythonCUDAGradioHugging FaceCOLMAP

Getting it running

Difficulty · hard Time to first run · 1h+

Requires Python 3.10 or 3.11 with a CUDA-capable GPU, model checkpoints download automatically from Hugging Face at first run.

In plain English

This repository contains the code and benchmark data from a research project at Johns Hopkins University asking a specific question: when AI models reconstruct a 3D scene from multiple photos, can those models be trusted? The short answer the paper gives is often no, and this code exists to measure and expose that problem. The core finding is that several widely-used 3D reconstruction models, including VGGT, MASt3R, DUSt3R, and Fast3R, will confidently produce 3D geometry even when fed pure random noise as input. This is a serious reliability problem. Evaluation tools built on top of these models inherit the flaw, meaning they can report that a set of images looks like a consistent 3D scene when it is, in fact, nonsense. To study this, the researchers built SysCON3D, a controlled benchmark dataset with different categories of broken input: pure Gaussian noise, mixed scenes that combine unrelated images, single outlier frames, and patched corruptions, alongside clean working scenes as a baseline. The dataset is hosted on Hugging Face and the code here downloads and evaluates it. There is also a human evaluation site where people rated scene consistency, giving the researchers a way to check whether automated metrics agree with human perception. As an alternative to the flawed learned metrics, the code also provides COLMAP-based evaluation. COLMAP is a classical geometry tool that uses feature matching and geometric reconstruction rather than learned neural networks, and the paper shows these classical metrics correlate up to four times better with human judgments than the existing learned approach. Practically, the repository includes scripts for running the interactive comparison demo (a Gradio web app where you can upload images and see how different models reconstruct them), generating benchmark assets, and running the full suite of metrics on any folder of images. It requires Python 3.10 or 3.11 plus a GPU with the appropriate CUDA setup. Model checkpoints are not bundled but download automatically from Hugging Face at runtime.

Copy-paste prompts

Prompt 1

Using mvp18/3dconsistency-metrics, run the SysCON3D benchmark to check whether DUSt3R accepts random Gaussian noise as valid 3D geometry and compare its score against the COLMAP baseline.

Prompt 2

Set up the Gradio comparison demo from 3dconsistency-metrics on a GPU machine, upload 10 photos of my office, and show me the consistency scores from VGGT vs COLMAP-based metrics side by side.

Prompt 3

Explain the four SysCON3D corruption categories (Gaussian noise, mixed scenes, outlier frames, patched corruptions) and what each one reveals about a 3D reconstruction model's reliability.

Open on GitHub → Explain another repo

← mvp18 on gitmyhub — every repo by this author, as a profile.

Verify against the repo before relying on details.