Match points across 10+ photos of a scene simultaneously to get cleaner, globally consistent correspondences for 3D reconstruction.
Use pre-trained outdoor or indoor models to get pixel-level matches with confidence scores without training from scratch.
Replace pairwise feature matching in an existing 3D reconstruction pipeline with MV-RoMa's multi-view consistent approach.
Test point matching quality on your own images using the included demo script right after setup.
Requires an NVIDIA GPU, PyTorch, and the UFM library installed as a separate dependency before the included demo script will run.
MV-RoMa is a Python library and research project from a group of computer vision researchers, presented at a major academic conference called CVPR in 2026. The goal is to find matching points between photographs, which is a core step in building 3D models from ordinary images. When you take several photos of the same object or scene from different angles, software can reconstruct a 3D model by figuring out which spot in one photo corresponds to which spot in another. Most existing tools compare two photos at a time. MV-RoMa does this with multiple photos simultaneously, keeping matches consistent across the whole set rather than treating each pair independently. The result is cleaner point tracks, meaning a single real-world location can be reliably followed across many photos. The library comes with pre-trained neural network weights for outdoor scenes (trained on a dataset called MegaDepth) and for indoor scenes. You give the model one source image and several target images, and it returns a map showing where each pixel in the source lands in each target, along with a confidence score for each prediction. Running the project requires a computer with a compatible NVIDIA GPU, Python 3.10 or later, and the PyTorch deep learning framework. Setup involves installing several dependencies including a separate library called UFM. A demo script is included so you can test the model on your own images right after setup. This is a research tool intended for computer vision engineers and researchers working on 3D reconstruction pipelines. It is not a consumer product, and using it effectively requires familiarity with deep learning and image processing concepts.
← icetea-cv on gitmyhub — every repo by this author, as a profile.
Verify against the repo before relying on details.