Run real-time object segmentation on a laptop without a high-end GPU by swapping in MobileSAM weights instead of the original SAM.
Drop MobileSAM into an existing SAM project by replacing only the model weights, no other code changes are needed.
Build a Gradio web demo that lets users click on uploaded photos to segment any object using MobileSAM.
Use MobileSAMv2 for automatic mask generation on photos without providing any manual prompts, replacing the slow grid-search approach.
Requires Python 3.8+, PyTorch 1.7+, and optionally a CUDA GPU, CPU-only works but inference is slower than the 12ms GPU benchmark.
SAM (Segment Anything Model) is a model from Meta AI that can identify and outline any object in an image when you give it a hint, such as clicking on a point or drawing a box. The original SAM is accurate but large and slow, requiring around 600 million parameters and taking about 456 milliseconds per image. MobileSAM is a lighter version designed to run on devices with limited computing power, including phones and laptops. The core change in MobileSAM is a swap of the image encoder. The original SAM uses a large vision transformer model with 611 million parameters that takes 452 milliseconds to process one image. MobileSAM replaces it with a compact model called Tiny-ViT that has only 5 million parameters and runs in about 8 milliseconds. The rest of the pipeline, including the mask decoder and the way you provide prompts, stays identical. This means existing projects that already use SAM can switch to MobileSAM by changing only the model weights, with no other code changes required. On a single GPU, MobileSAM processes an image in about 12 milliseconds total, compared to 456 milliseconds for the original SAM. The model was trained on a single GPU using roughly 100,000 images (about 1 percent of the original SAM training set) in under a day. The README compares MobileSAM to another lightweight alternative called FastSAM, showing that MobileSAM is about seven times smaller and five times faster, and produces masks that match the original SAM much more closely. A follow-up project called MobileSAMv2 is also described briefly. It changes how the model generates masks when no prompt is given, replacing a slow grid-search approach with one that finds objects first and then uses them as prompts. Installation requires Python 3.8 or later, PyTorch 1.7 or later, and optionally a CUDA-enabled GPU. A Gradio-based demo can be run locally after installation, and a public demo is available on Hugging Face.
← chaoningzhang on gitmyhub — every repo by this author, as a profile.
Verify against the repo before relying on details.