explaingit

carriex6/cvpr2026_similarity_as_evidence

18PythonAudience · researcherComplexity · 4/5Setup · hard

TLDR

Official CVPR 2026 code that uses BiomedCLIP image similarity to estimate uncertainty in active learning for medical image classification, helping identify which unlabeled scans are most worth a human expert labeling.

Mindmap

mindmap
  root((similarity-as-evidence))
    Core method
      BiomedCLIP similarity
      Vacuity measure
      Dissonance measure
      Uncertainty ranking
    Active learning
      Label selection
      Category balance
      Small labeled sets
    Datasets
      Brain tumors
      Skin conditions
      Lung tissue
      Retinal scans
    Tech Stack
      Python
      BiomedCLIP
      PyTorch
Click or tap to explore — scroll the page freely

Code map

Detail Auto

An interactive map of this repo's files and how they connect — its source is parsed live in your browser. Click Visualize to build it.

filefunction / class

Things people build with this

USE CASE 1

Apply Similarity-as-Evidence active learning to a medical image dataset to rank which unlabeled scans would benefit most from expert labeling.

USE CASE 2

Compare vacuity and dissonance uncertainty scores against standard active learning baselines across ten benchmark medical imaging datasets.

USE CASE 3

Adapt BiomedCLIP similarity as an uncertainty proxy in your own active learning pipeline by modifying the provided acquisition function code.

Tech stack

PythonBiomedCLIPPyTorch

Getting it running

Difficulty · hard Time to first run · 1day+

Medical image datasets must be obtained and formatted separately via an external data guide, GPU recommended for BiomedCLIP embedding extraction.

No license information is provided in this repository.

In plain English

This repository contains the official code for a research paper titled "Similarity-as-Evidence," published at the CVPR 2026 computer vision conference. The paper addresses a problem in AI systems used to classify medical images: AI models trained on limited labeled data tend to be overconfident in their predictions, meaning they give high certainty scores even when they should not. This overconfidence makes it harder to know which images would be most useful to have a human expert label next. The method works within a framework called active learning, where an AI system is given a small set of labeled examples and then selects additional examples for a human to label, trying to choose the ones that will improve the model the most. The key idea here is using a biomedical AI model, called BiomedCLIP, to measure how similar an unlabeled image is to the already-labeled examples. That similarity score is then used as evidence about how uncertain the model should be, rather than trusting the model's own confidence estimate, which tends to be inflated. The uncertainty is broken down into two components. Vacuity measures how little evidence the model has about an image overall. Dissonance measures how much the evidence points in conflicting directions, for instance when an image looks similar to examples from two different disease categories at once. The system combines these two measures with adjustable weights and uses the result to rank which unlabeled images to ask an expert to label next. It also tries to keep the selected images balanced across different disease categories to avoid spending all the labeling budget on one type of case. The code supports ten medical image datasets covering brain tumors, breast ultrasound, skin conditions, knee X-rays, lung tissue, and retinal scans, among others. The repository does not include the medical images themselves and points to a separate data guide for how to obtain and format them. Installation requires a standard Python dependency install and an optional script to download the model weights.

Copy-paste prompts

Prompt 1
I cloned carriex6/cvpr2026_similarity_as_evidence and installed the Python dependencies. How do I run the active learning experiment on the skin condition dataset, and where do I download the required data and model weights?
Prompt 2
Explain the difference between vacuity and dissonance in the Similarity-as-Evidence method. How does the code combine the two uncertainty components with adjustable weights to produce a final ranking of unlabeled images?
Prompt 3
I want to apply the similarity-as-evidence approach to my own breast histology dataset that is not in the ten supported ones. What data format does it expect and which scripts do I need to modify to add a new dataset?
Prompt 4
Walk me through what happens step by step when the active learning loop in carriex6/cvpr2026_similarity_as_evidence selects the next batch of images to label, from computing BiomedCLIP embeddings to the final balanced category sampling.
Open on GitHub → Explain another repo

← carriex6 on gitmyhub — every repo by this author, as a profile.

Verify against the repo before relying on details.