explaingit

facebookresearch/sam2

Analysis updated 2026-06-21

19,144Jupyter NotebookAudience · researcherComplexity · 4/5Setup · hard

TLDR

Meta's AI model that identifies and outlines any object in photos or videos by tracking it frame-by-frame, useful for video editing, labeling AI training data, medical imaging, and building object-aware apps.

Mindmap

mindmap
  root((SAM 2))
    What it does
      Object segmentation
      Video object tracking
    Tech stack
      Python
      PyTorch
      CUDA GPU
      Jupyter Notebook
    Input types
      Photos
      Videos
      Click or box prompts
    Use cases
      Video editing
      AI dataset labeling
      Medical imaging
      Object detection apps
Click or tap to explore — scroll the page freely

Code map

Detail Auto

An interactive map of this repo's files and how they connect — its source is parsed live in your browser. Click Visualize to build it.

filefunction / class

What do people build with it?

USE CASE 1

Cut out a moving subject from a video clip by clicking on it once and having SAM 2 trace it across every frame.

USE CASE 2

Label objects in video datasets for training your own AI model by using SAM 2 to auto-generate segmentation masks.

USE CASE 3

Analyze medical scan images by isolating regions of interest with point or box prompts.

USE CASE 4

Build an app that detects and highlights specific objects in a live video feed.

What is it built with?

PythonPyTorchCUDAJupyter Notebook

How does it compare?

facebookresearch/sam2qwenlm/qwen3-vlnirdiamant/agents-towards-production
Stars19,14419,15919,124
LanguageJupyter NotebookJupyter NotebookJupyter Notebook
Setup difficultyhardmoderatemoderate
Complexity4/53/54/5
Audienceresearcherdeveloperdeveloper

Figures from each repo's GitHub metadata at analysis time.

How do you get it running?

Difficulty · hard Time to first run · 1h+

Requires Python 3.10+, PyTorch 2.5.1+, and a GPU, CPU-only inference is extremely slow for video.

In plain English

SAM 2 (Segment Anything Model 2) is an AI model from Meta's research lab that can automatically identify and outline any object in a photo or video, a task called "image segmentation." You point it at an object (by clicking, drawing a box, or specifying a point), and it precisely traces the boundary of that object. The key upgrade over the original SAM is that it works on video too, tracking the object frame-by-frame across the entire clip, even as the object moves or partially disappears. Under the hood, it uses a transformer architecture, the same family of neural networks behind modern language models, plus a "streaming memory" system that lets it remember where an object was in previous frames to keep tracking it in later ones. Meta also released a large new video segmentation dataset (SA-V) that was used to train the model. Multiple size variants are available (tiny, small, base plus, large), and the model can be compiled for faster video processing. You'd use this when you need to isolate objects in photos or videos: cutting out subjects for video editing, training other AI models that need labeled object data, analyzing medical scans, or building apps that need to "understand" where things are in an image. It requires Python 3.10 or higher, PyTorch 2.5.1 or higher, and a GPU. Usage examples are provided as Jupyter notebooks.

Copy-paste prompts

Prompt 1
Using SAM 2's Python API, write code that takes a video file, lets me click on an object in frame 0, and exports a mask overlay video tracking that object across all frames.
Prompt 2
How do I run SAM 2 on an Apple Silicon Mac without a CUDA GPU, what PyTorch backend should I use and which model size is fastest?
Prompt 3
Write a Python script using SAM 2 to segment all objects in a folder of images, saving each object mask as a separate PNG for a training dataset.
Prompt 4
Walk me through running the SAM 2 Jupyter notebook examples so I can interactively test object segmentation on my own photos.

Frequently asked questions

What is sam2?

Meta's AI model that identifies and outlines any object in photos or videos by tracking it frame-by-frame, useful for video editing, labeling AI training data, medical imaging, and building object-aware apps.

What language is sam2 written in?

Mainly Jupyter Notebook. The stack also includes Python, PyTorch, CUDA.

How hard is sam2 to set up?

Setup difficulty is rated hard, with roughly 1h+ to a first successful run.

Who is sam2 for?

Mainly researcher.

Open on GitHub → Explain another repo

This repo across BitVibe Labs

Scan in gitsafehub Deploy in gitdeployhub facebookresearch on gitmyhub

Verify against the repo before relying on details.