explaingit

facebookresearch/vggt

13,089Python
This is a quick first-pass explanation. The richer sections — use-cases, tech stack, setup, prompts — are still being generated.

TLDR

VGGT, which stands for Visual Geometry Grounded Transformer, is a research project from Meta AI and the University of Oxford that won the Best Paper Award at the CVPR 2025 computer vision conference.

Mindmap

A visual breakdown will appear here once this repo is fully enriched.

Code map

Detail Auto

An interactive map of this repo's files and how they connect — its source is parsed live in your browser. Click Visualize to build it.

filefunction / class

In plain English

VGGT, which stands for Visual Geometry Grounded Transformer, is a research project from Meta AI and the University of Oxford that won the Best Paper Award at the CVPR 2025 computer vision conference. The project addresses a specific problem: given a set of photographs of the same place or object, figure out the 3D structure of the scene and the exact position and orientation of each camera that took the photos. Traditionally, reconstructing a 3D scene from multiple images is a slow, multi-step process involving many separate algorithms. VGGT replaces that pipeline with a single neural network that takes in one or more images and directly outputs the camera positions, depth information for each pixel, and a map of 3D points in space, all within a few seconds. It works whether you give it a single image or hundreds of images of the same scene. The outputs the model produces are standard quantities used in computer graphics and robotics: camera intrinsic and extrinsic matrices (which describe lens properties and camera placement), depth maps (how far each pixel is from the camera), and point clouds (collections of 3D coordinates representing the scene geometry). These can be fed directly into other tools for 3D visualization or for creating photorealistic 3D scenes. Getting started requires cloning the repository and installing a small set of Python dependencies. The pretrained model weights download automatically from Hugging Face on first use. A commercial-use version of the model is also available under a separate license, requiring an application approval process. The repository includes training code for fine-tuning the model on custom datasets, evaluation scripts, and tools to export results in a standard format used by other 3D reconstruction pipelines. An interactive demo is available on Hugging Face Spaces for trying the model without any local setup.

Open on GitHub → Explain another repo

← facebookresearch on gitmyhub — every repo by this author, as a profile.

Verify against the repo before relying on details.