Train the VA-Adapter on EchoCLIP, BiomedCLIP, or USFM to guide echocardiography probes
Reproduce the MICCAI 2026 paper numbers on a 1.31M-sample ultrasound dataset
Adapt the VA-Adapter wrapper pattern to another medical imaging foundation model
Requires Python 3.8+, PyTorch 2.1+, 8-GPU distributed training, and separate downloads of BiomedCLIP and USFM pretrained weights.
VA-Adapter is the official code release for a medical imaging research paper that was accepted at MICCAI 2026. The paper is about echocardiography, which is the use of ultrasound to look at the heart. Performing a good echocardiogram is hard, and there are not enough trained sonographers to meet demand, so the authors want to build software that helps guide the probe to the right position and angle. The difficulty, the paper explains, is that every person looks different on ultrasound. The flat 2D images coming off the probe vary from patient to patient, and the underlying 3D shape of the heart varies too. The authors start from large ultrasound foundation models that have already been trained on huge amounts of ultrasound data, since these models are good at reading 2D ultrasound images. The catch is that these foundation models do not understand the 3D layout of a specific patient's heart. The contribution is a small add on module called the Vision-Action Adapter, or VA-Adapter. It is inserted into the image encoder of an existing foundation model and learns from past pairs of images and probe movements during a session. In effect, it lets the model build up a sense of the individual patient's anatomy while keeping most of the foundation model frozen. The paper reports that this approach beats stronger baseline probe guidance models while training around 33 times fewer parameters, on a dataset of more than 1.31 million samples. The repository supplies adapter wrappers for three different ultrasound foundation models: EchoCLIP, BiomedCLIP, and USFM. The code lives under a models folder, with a separate file for each backbone and a shared sequence model. The README also includes the exact training commands for all three, using PyTorch distributed training across eight GPUs and a batch size of 256 over five epochs. To run the code you need Python 3.8 or newer, PyTorch 2.1 or newer, timm 1.0.15, and open_clip_torch 2.32.0, plus einops, scipy, matplotlib, and tqdm. For BiomedCLIP and USFM you also need to download the official pretrained weights separately and point the training script at them. Logs and the best checkpoint, judged by validation mean absolute error, are written to a chosen logs directory.
Generated 2026-05-22 · Model: sonnet-4-6 · Verify against the repo before relying on details.