Run a fast vision-language model on an iPhone or iPad using the included demo iOS app
Fine-tune a FastVLM variant on your own image dataset using the LLaVA-based training pipeline
Export a FastVLM model for Apple Silicon to power an on-device image question-answering feature in a macOS app
Benchmark FastVLM against other vision-language models to validate speed gains for a production use case
Requires downloading pretrained weights and Python environment setup, Apple Silicon export path needed for on-device iOS/macOS deployment.
FastVLM is a research project from Apple that makes AI models faster at understanding images. Specifically, it addresses the bottleneck that occurs when an AI model has to process a high-resolution photo before it can say anything about it. The project introduces a new image-processing component called FastViTHD that produces fewer intermediate tokens, which means the model can start generating a response much sooner. The practical result is dramatic speed improvements. The smallest variant of FastVLM responds up to 85 times faster than a comparable model, and the larger 7-billion-parameter version is nearly 8 times faster than competing approaches, all while matching or exceeding their accuracy scores. These results were published at CVPR 2025, a major computer vision conference. The code ships in three sizes: 0.5B, 1.5B, and 7B, where the number refers to the count of parameters in the language part of the model. Pretrained weights are available for download, and running inference on a standard computer requires only a few setup commands and a Python script. The repository also includes a dedicated export path for running the models on Apple Silicon chips, including iPhones, iPads, and Macs, with a demo iOS app included to show the model working on a mobile device. This is primarily a research release aimed at developers and researchers who want to experiment with fast vision-language models, fine-tune their own variants, or understand the technical approach described in the paper. The training pipeline builds on the existing LLaVA codebase, so anyone already familiar with that project will find the workflow recognizable.
← apple on gitmyhub — every repo by this author, as a profile.
Verify against the repo before relying on details.