explaingit

nianticlabs/monodepth2

4,486Jupyter NotebookAudience · researcherComplexity · 4/5LicenseSetup · hard

TLDR

A research codebase by Niantic that estimates depth in photographs using only a regular camera and no depth sensor, trained on video without any labeled depth data.

Mindmap

mindmap
  root((monodepth2))
    What it does
      Depth from video
      Self-supervised training
      No depth sensor needed
    Pretrained models
      Monocular video
      Stereo pairs
      Mono plus stereo
    Training
      KITTI dataset
      Custom datasets
      175 GB data required
    Tech
      Python
      PyTorch
Click or tap to explore — scroll the page freely

Code map

Detail Auto

An interactive map of this repo's files and how they connect — its source is parsed live in your browser. Click Visualize to build it.

filefunction / class

Things people build with this

USE CASE 1

Run depth estimation on your own photos using a pretrained model without writing any training code.

USE CASE 2

Train a custom depth estimation model on new video by writing a dataloader class following the provided template.

USE CASE 3

Benchmark monocular depth estimation on the KITTI dataset and compare results against published numbers.

Tech stack

PythonPyTorch

Getting it running

Difficulty · hard Time to first run · 30min

Training requires the 175 GB KITTI dataset, inference with pretrained models works without it.

Non-commercial use only, you may use it for research and personal projects but not in commercial products.

In plain English

Monodepth2 is a research codebase from Niantic that estimates how far away things are in a photograph using only a regular camera. Normally, figuring out depth from images requires either a depth sensor, a stereo camera rig with two lenses, or knowing the camera's exact movement between frames. This project's approach, described in a paper published at ICCV 2019, learns to estimate depth from ordinary video without needing any depth sensor data during training. It uses the structure of video frames over time as a self-supervision signal instead. The code is written in Python using PyTorch. It provides pretrained models you can download and use immediately, and also scripts for training your own models on new data. To test it on a single photo, you run a short command pointing at an image file and specifying which pretrained model to use. The pretrained models vary by resolution (640x192 or 1024x320) and by what kind of data they were trained on: monocular video only, stereo pairs only, or a combination of both. Training your own model requires the KITTI dataset, a large autonomous driving dataset captured from a car driving around German cities. The dataset weighs about 175 GB, and the README includes specific commands for downloading and preparing it. Training is handled by a single train.py script with flags for setting the model architecture, batch size, learning rate, and other parameters. The training code can also be pointed at a custom dataset if you write a dataloader class following the provided example. Evaluation scripts produce standard depth estimation metrics against ground truth measurements from KITTI's LIDAR sensor. The README includes a table of benchmark numbers for each pretrained model. The code is released for non-commercial use only, as stated in the license file. It is from Niantic, the company behind Pokemon Go, which has research interests in augmented reality and 3D understanding of real-world environments.

Copy-paste prompts

Prompt 1
Using monodepth2 pretrained models, run depth estimation on my own images and visualize the depth map in Python.
Prompt 2
I want to fine-tune monodepth2 on a custom driving dataset, show me what methods the dataloader class needs to implement.
Prompt 3
Compare the mono, stereo, and mono+stereo monodepth2 pretrained models and explain when to choose each one.
Prompt 4
Evaluate a monodepth2 model on the KITTI dataset and help me interpret the abs_rel and sq_rel depth error metrics.
Open on GitHub → Explain another repo

← nianticlabs on gitmyhub — every repo by this author, as a profile.

Verify against the repo before relying on details.