Turn a flat photo into a 3D cinematic video with zoom, swing, and dolly zoom camera movements
Generate a 3D mesh file from a 2D image for use in graphics or animation projects
Reproduce the CVPR 2020 3D photo inpainting paper results on your own images
Try the technique in-browser using the linked Google Colab notebook without installing anything
Requires Linux, a compatible Python and PyTorch version, and a setup script to download pretrained model weights.
This repository contains the code from a research paper published at CVPR 2020 (a major computer vision conference). The project takes a single ordinary photo as input and converts it into a short 3D video where the camera appears to move through the scene, creating a sense of depth that the original flat image does not have. The underlying technique works in two steps. First, the code estimates how far away different parts of the image are from the camera, producing a depth map. Second, it fills in the parts of the scene that would have been hidden behind foreground objects from the original viewpoint. This fill-in step, called inpainting, is what lets the system render the scene from slightly different angles without leaving obvious holes. The result is a layered 3D representation that can be displayed using standard graphics tools. Once you run the code on a photo, it saves several video files showing different camera movements: zooming in, swinging side to side, moving in a circle, and a dolly zoom effect. If you want, it can also save a 3D mesh file of the scene. The whole process typically takes two to three minutes per image, depending on the machine. Setting up the project requires Linux, a compatible version of Python, and PyTorch. A setup script downloads the pretrained model weights. There is also a Google Colab notebook linked in the README for anyone who wants to try it in a browser without installing anything locally. The code is released under the MIT license. The README includes a citation block for the original paper and credits to several other open-source projects the code builds on, including MiDaS for depth estimation and EdgeConnect for image inpainting.
← vt-vl-lab on gitmyhub — every repo by this author, as a profile.
Verify against the repo before relying on details.