chaoningzhang/mobilesam

★ 5,752Jupyter NotebookAudience · researcherComplexity · 3/5Setup · moderate

Mindmap

mindmap
  root((MobileSAM))
    Core Change
      Tiny-ViT encoder
      5M parameters
      8ms per image
    vs Original SAM
      611M parameters
      452ms per image
      Same mask decoder
    Prompts Supported
      Point click
      Bounding box
      Automatic masks
    Setup
      Python 3.8 plus
      PyTorch 1.7 plus
      Optional GPU
    Demo
      Gradio local demo
      HuggingFace online

mindmap root((MobileSAM)) Core Change Tiny-ViT encoder 5M parameters 8ms per image vs Original SAM 611M parameters 452ms per image Same mask decoder Prompts Supported Point click Bounding box Automatic masks Setup Python 3.8 plus PyTorch 1.7 plus Optional GPU Demo Gradio local demo HuggingFace online

Click or tap to explore — scroll the page freely

Things people build with this

USE CASE 1

Run real-time object segmentation on a laptop without a high-end GPU by swapping in MobileSAM weights instead of the original SAM.

USE CASE 2

Drop MobileSAM into an existing SAM project by replacing only the model weights, no other code changes are needed.

USE CASE 3

Build a Gradio web demo that lets users click on uploaded photos to segment any object using MobileSAM.

USE CASE 4

Use MobileSAMv2 for automatic mask generation on photos without providing any manual prompts, replacing the slow grid-search approach.

Tech stack

PythonPyTorchCUDAGradioJupyter Notebook

Getting it running

Difficulty · moderate Time to first run · 30min

Requires Python 3.8+, PyTorch 1.7+, and optionally a CUDA GPU, CPU-only works but inference is slower than the 12ms GPU benchmark.

In plain English

SAM (Segment Anything Model) is a model from Meta AI that can identify and outline any object in an image when you give it a hint, such as clicking on a point or drawing a box. The original SAM is accurate but large and slow, requiring around 600 million parameters and taking about 456 milliseconds per image. MobileSAM is a lighter version designed to run on devices with limited computing power, including phones and laptops. The core change in MobileSAM is a swap of the image encoder. The original SAM uses a large vision transformer model with 611 million parameters that takes 452 milliseconds to process one image. MobileSAM replaces it with a compact model called Tiny-ViT that has only 5 million parameters and runs in about 8 milliseconds. The rest of the pipeline, including the mask decoder and the way you provide prompts, stays identical. This means existing projects that already use SAM can switch to MobileSAM by changing only the model weights, with no other code changes required. On a single GPU, MobileSAM processes an image in about 12 milliseconds total, compared to 456 milliseconds for the original SAM. The model was trained on a single GPU using roughly 100,000 images (about 1 percent of the original SAM training set) in under a day. The README compares MobileSAM to another lightweight alternative called FastSAM, showing that MobileSAM is about seven times smaller and five times faster, and produces masks that match the original SAM much more closely. A follow-up project called MobileSAMv2 is also described briefly. It changes how the model generates masks when no prompt is given, replacing a slow grid-search approach with one that finds objects first and then uses them as prompts. Installation requires Python 3.8 or later, PyTorch 1.7 or later, and optionally a CUDA-enabled GPU. A Gradio-based demo can be run locally after installation, and a public demo is available on Hugging Face.

Copy-paste prompts

Prompt 1

Install MobileSAM and write a Python script that loads the model, takes an image path, and segments an object at a point coordinate I specify.

Prompt 2

How do I replace the image encoder in my existing SAM pipeline with MobileSAM Tiny-ViT weights, show me exactly which lines of code to change.

Prompt 3

Run the MobileSAM Gradio demo locally so I can upload photos and click to segment objects, give me the installation and launch commands.

Prompt 4

Write a Python script using MobileSAM that takes a bounding box as input and returns the segmentation mask saved as a PNG file.

Prompt 5

What is the difference between MobileSAM and FastSAM, when should I use each for on-device image segmentation?

Open on GitHub → Explain another repo

← chaoningzhang on gitmyhub — every repo by this author, as a profile.

Verify against the repo before relying on details.