Cut out subjects from video clips for editing by automatically tracing object boundaries frame-by-frame.
Label objects in photos and videos to create training datasets for other AI models.
Analyze medical scans by automatically segmenting organs or tumors to assist diagnosis.
Build apps that understand where objects are in images by detecting and outlining them automatically.
Requires CUDA/GPU setup and PyTorch compilation; model weights download and inference optimization needed.
SAM 2 (Segment Anything Model 2) is an AI model from Meta's research lab that can automatically identify and outline any object in a photo or video, a task called "image segmentation." You point it at an object (by clicking, drawing a box, or specifying a point), and it precisely traces the boundary of that object. The key upgrade over the original SAM is that it works on video too, tracking the object frame-by-frame across the entire clip, even as the object moves or partially disappears. Under the hood, it uses a transformer architecture, the same family of neural networks behind modern language models, plus a "streaming memory" system that lets it remember where an object was in previous frames to keep tracking it in later ones. Meta also released a large new video segmentation dataset (SA-V) that was used to train the model. Multiple size variants are available (tiny, small, base plus, large), and the model can be compiled for faster video processing. You'd use this when you need to isolate objects in photos or videos: cutting out subjects for video editing, training other AI models that need labeled object data, analyzing medical scans, or building apps that need to "understand" where things are in an image. It requires Python 3.10 or higher, PyTorch 2.5.1 or higher, and a GPU. Usage examples are provided as Jupyter notebooks.
Generated 2026-05-18 · Model: sonnet-4-6 · Verify against the repo before relying on details.