explaingit

facebookresearch/segment-anything

54,180Jupyter NotebookAudience · developerComplexity · 3/5StaleLicenseSetup · moderate

TLDR

AI model that cuts out any object in an image with a click, box, or text prompt, no special training needed for each object type.

Mindmap

mindmap
  root((repo))
    What it does
      Segment objects in images
      Zero-shot recognition
      Auto-mask entire images
    How it works
      Vision Transformer encoder
      Mask decoder
      Prompt-based output
    Use cases
      Photo editing
      Medical imaging
      Satellite analysis
      Robotics perception
    Tech stack
      Python
      PyTorch
      Jupyter Notebooks
    Audience
      Computer vision researchers
      Image processing developers

Things people build with this

USE CASE 1

Build a photo editor that lets users select objects by clicking or drawing a box around them.

USE CASE 2

Analyze medical images to automatically isolate organs or tumors for diagnosis.

USE CASE 3

Process satellite imagery to detect and extract buildings, roads, or land features.

USE CASE 4

Add object detection to a robot's vision system to identify and interact with items in its environment.

Tech stack

PythonPyTorchtorchvisionVision TransformerONNX

Getting it running

Difficulty · moderate Time to first run · 30min

Requires PyTorch installation and downloading pre-trained model weights; ONNX export optional.

Use freely for any purpose, including commercial use, as long as you keep the copyright notice and license text.

In plain English

Segment Anything Model (SAM) is an AI model from Meta's research team that can identify and cut out any object in an image, even objects it has never been specifically trained to recognize. Traditional image segmentation tools require training on labeled examples of the exact type of object you want to detect. SAM works differently: it accepts a simple prompt such as a point click, a bounding box drawn around an object, or a text description, and it generates a precise mask (a pixel-level outline) of the corresponding object. It can also automatically generate masks for every distinct object in an entire image without any prompt at all. Under the hood, SAM was trained on a dataset of 11 million images and over 1 billion annotated masks, giving it broad visual knowledge. The model architecture uses a Vision Transformer (a type of neural network designed for image understanding) to encode images into a representation that the mask decoder can then use to respond to prompts. The model is available in three sizes with different accuracy and speed tradeoffs. The lightweight mask decoder can also be exported to the ONNX format, which is a standard format for running models in environments other than Python, including in web browsers. You would use SAM if you are a computer vision researcher or developer who needs flexible, zero-shot image segmentation for tasks like photo editing, medical imaging, satellite image analysis, robotics perception, or any application where you need to isolate objects in images. The tech stack is Python with PyTorch and torchvision, with example Jupyter Notebooks included. A newer version called SAM 2 extending these capabilities to video is also available.

Copy-paste prompts

Prompt 1
How do I use Segment Anything Model to mask objects in my images with a single click?
Prompt 2
Show me how to integrate SAM into a Python script to automatically segment all objects in a batch of images.
Prompt 3
How can I export SAM's mask decoder to ONNX format to run it in a web browser?
Prompt 4
What's the difference between SAM's three model sizes, and which one should I use for real-time applications?
Prompt 5
How do I use SAM 2 to segment and track objects across video frames?
Open on GitHub → Explain another repo

Generated 2026-05-18 · Model: sonnet-4-6 · Verify against the repo before relying on details.