DINOv2 is a computer vision AI model from Meta AI Research that learns to understand images without any labels or human annotations. Most image recognition systems require large amounts of labeled training data, where humans tag each photo. DINOv2 was trained on 142 million images using a self-supervised approach, meaning it learned by finding patterns in the images themselves rather than from labels. The result is a model that produces rich, general-purpose image features. These features describe what is in a photo in a compact numerical form that other, simpler systems can then use for tasks like classifying images, detecting objects, estimating depth, or identifying things in video. Because the features are general, they transfer well to new tasks without needing to retrain the whole model from scratch. Four model sizes are available, ranging from 21 million parameters up to 1.1 billion. Larger models are more capable but require more computing resources. All models can be loaded in a few lines of Python code using PyTorch. The repository also includes specialized variants for biological microscopy imaging (Cell-DINO) and medical X-ray analysis (XRay-DINO), each trained on domain-specific image data. This repository contains the pretrained model weights and the Python code to load and use them. It also includes Jupyter notebooks demonstrating specific tasks such as depth estimation and image segmentation. A follow-on project called DINOv3 has since been released by the same team, continuing this line of research. The full README is longer than what was shown.
← facebookresearch on gitmyhub — every repo by this author, as a profile.
Verify against the repo before relying on details.