explaingit

leaplabthu/va-adapter

12PythonAudience · researcherComplexity · 5/5ActiveSetup · hard

TLDR

Official MICCAI 2026 code for the Vision-Action Adapter, a small module that plugs into ultrasound foundation models to guide an echocardiography probe to the right view.

Mindmap

mindmap
  root((VA-Adapter))
    Inputs
      2D ultrasound frames
      Past probe movements
      Foundation model weights
    Outputs
      Predicted probe actions
      Trained adapter checkpoints
      Validation MAE logs
    Use Cases
      Probe guidance research
      Adapter training on EchoCLIP
      Adapter training on BiomedCLIP
      Adapter training on USFM
    Tech Stack
      Python
      PyTorch
      timm
      open_clip_torch

Things people build with this

USE CASE 1

Train the VA-Adapter on EchoCLIP, BiomedCLIP, or USFM to guide echocardiography probes

USE CASE 2

Reproduce the MICCAI 2026 paper numbers on a 1.31M-sample ultrasound dataset

USE CASE 3

Adapt the VA-Adapter wrapper pattern to another medical imaging foundation model

Tech stack

PythonPyTorchtimmopen_clip_torchCUDA

Getting it running

Difficulty · hard Time to first run · 1day+

Requires Python 3.8+, PyTorch 2.1+, 8-GPU distributed training, and separate downloads of BiomedCLIP and USFM pretrained weights.

In plain English

VA-Adapter is the official code release for a medical imaging research paper that was accepted at MICCAI 2026. The paper is about echocardiography, which is the use of ultrasound to look at the heart. Performing a good echocardiogram is hard, and there are not enough trained sonographers to meet demand, so the authors want to build software that helps guide the probe to the right position and angle. The difficulty, the paper explains, is that every person looks different on ultrasound. The flat 2D images coming off the probe vary from patient to patient, and the underlying 3D shape of the heart varies too. The authors start from large ultrasound foundation models that have already been trained on huge amounts of ultrasound data, since these models are good at reading 2D ultrasound images. The catch is that these foundation models do not understand the 3D layout of a specific patient's heart. The contribution is a small add on module called the Vision-Action Adapter, or VA-Adapter. It is inserted into the image encoder of an existing foundation model and learns from past pairs of images and probe movements during a session. In effect, it lets the model build up a sense of the individual patient's anatomy while keeping most of the foundation model frozen. The paper reports that this approach beats stronger baseline probe guidance models while training around 33 times fewer parameters, on a dataset of more than 1.31 million samples. The repository supplies adapter wrappers for three different ultrasound foundation models: EchoCLIP, BiomedCLIP, and USFM. The code lives under a models folder, with a separate file for each backbone and a shared sequence model. The README also includes the exact training commands for all three, using PyTorch distributed training across eight GPUs and a batch size of 256 over five epochs. To run the code you need Python 3.8 or newer, PyTorch 2.1 or newer, timm 1.0.15, and open_clip_torch 2.32.0, plus einops, scipy, matplotlib, and tqdm. For BiomedCLIP and USFM you also need to download the official pretrained weights separately and point the training script at them. Logs and the best checkpoint, judged by validation mean absolute error, are written to a chosen logs directory.

Copy-paste prompts

Prompt 1
Set up the VA-Adapter training command for EchoCLIP on 8 GPUs with batch size 256 for 5 epochs
Prompt 2
Explain how the Vision-Action Adapter is inserted into the image encoder of a frozen ultrasound foundation model
Prompt 3
Help me download the BiomedCLIP and USFM pretrained weights and point the VA-Adapter training script at them
Prompt 4
Compare the parameter counts and MAE results of VA-Adapter across the three supported backbones
Open on GitHub → Explain another repo

Generated 2026-05-22 · Model: sonnet-4-6 · Verify against the repo before relying on details.