Run depth estimation on your own photos using a pretrained model without writing any training code.
Train a custom depth estimation model on new video by writing a dataloader class following the provided template.
Benchmark monocular depth estimation on the KITTI dataset and compare results against published numbers.
Training requires the 175 GB KITTI dataset, inference with pretrained models works without it.
Monodepth2 is a research codebase from Niantic that estimates how far away things are in a photograph using only a regular camera. Normally, figuring out depth from images requires either a depth sensor, a stereo camera rig with two lenses, or knowing the camera's exact movement between frames. This project's approach, described in a paper published at ICCV 2019, learns to estimate depth from ordinary video without needing any depth sensor data during training. It uses the structure of video frames over time as a self-supervision signal instead. The code is written in Python using PyTorch. It provides pretrained models you can download and use immediately, and also scripts for training your own models on new data. To test it on a single photo, you run a short command pointing at an image file and specifying which pretrained model to use. The pretrained models vary by resolution (640x192 or 1024x320) and by what kind of data they were trained on: monocular video only, stereo pairs only, or a combination of both. Training your own model requires the KITTI dataset, a large autonomous driving dataset captured from a car driving around German cities. The dataset weighs about 175 GB, and the README includes specific commands for downloading and preparing it. Training is handled by a single train.py script with flags for setting the model architecture, batch size, learning rate, and other parameters. The training code can also be pointed at a custom dataset if you write a dataloader class following the provided example. Evaluation scripts produce standard depth estimation metrics against ground truth measurements from KITTI's LIDAR sensor. The README includes a table of benchmark numbers for each pretrained model. The code is released for non-commercial use only, as stated in the license file. It is from Niantic, the company behind Pokemon Go, which has research interests in augmented reality and 3D understanding of real-world environments.
← nianticlabs on gitmyhub — every repo by this author, as a profile.
Verify against the repo before relying on details.