Search a photo library by typing a plain-English description instead of using tags.
Classify images into categories without collecting any labeled training examples for those categories.
Load and compare dozens of pre-trained CLIP variants through a single consistent Python interface.
Train a new CLIP model from scratch on a custom large-scale image-text dataset.
Training large models requires a distributed compute cluster, inference-only use needs only pip install and a few lines of Python.
OpenCLIP is an open-source implementation of CLIP, a type of AI model originally created by OpenAI. CLIP (Contrastive Language-Image Pre-training) is trained to understand the relationship between images and text, so it can compare a photo to a description and judge how well they match. This enables a range of applications: searching a collection of images using plain text queries, classifying images into categories without needing labeled training examples for every category, and building systems that understand both visual and written content together. This repository provides code to both use pre-trained CLIP models and train new ones from scratch. The project has trained dozens of models on publicly available large-scale image-text datasets, including LAION-400M (400 million image-text pairs), LAION-2B (2 billion pairs), and DataComp-1B. The best of these models reach zero-shot accuracy above 80% on ImageNet, a standard image recognition benchmark, without any ImageNet-specific training examples. Models are available as a Python package (open_clip_torch, installable via pip) and can be loaded with a few lines of code. The project also makes it straightforward to load the original OpenAI CLIP weights alongside community-trained alternatives such as SigLIP and DFN models, all through the same interface. The codebase supports training on single machines and on distributed computing clusters. The current main branch is in the middle of a significant refactor that adds support for a newer distributed training approach called FSDP2. The previous stable training pipeline remains available on the v3 branch. Users who only need to load and run pre-trained models for inference are unaffected by the training-side changes. The full README is longer than what was shown.
← mlfoundations on gitmyhub — every repo by this author, as a profile.
Verify against the repo before relying on details.