mlx-sam is an Apple Silicon port of Meta's SAM 2.1, a model that cuts objects out of images and tracks them through video. The point of the project is to do this work locally on a Mac using Apple's MLX framework, with Python 3.14 and no PyTorch in the runtime path. PyTorch is only used as an optional extra for converting weights and for comparing results against the official model. The basic workflow matches the upstream SAM2 one. You load a video, click somewhere on an object in a frame, and the model produces a mask for that object and follows it through the rest of the clip. Clicks can be positive (this is part of the object) or negative (this is not). Corrections can be added on any frame and on multiple objects. The propagation can run forward from frame zero, backward from any frame, or both directions from a middle frame to build bidirectional results around an edit point. Box prompts also work. The Python API mirrors the names from the official SAM2 codebase, so methods like from_pretrained, init_state, add_new_points_or_box, propagate_in_video, and reset_state behave the way an existing SAM2 user would expect. There is also a stream_in_video helper that yields per frame events for UI or worker use, and can emit one final stacked mask tensor at the end. Performance has a few knobs. The default image size is 1024 to match SAM2, and lower values trade mask quality for speed and memory. Memory tensors can be kept in float16. An opt in precompute_image_features mode caches image features once during init_state, which makes repeated propagation and correction passes faster at the cost of upfront work. A separate benchmark script also offers a preview temporal downsampling mode that only runs the model on every k-th frame and interpolates the rest. The repo ships extras around the core library. There is a local browser demo at port 7861 launched with mlx-sam-app, an mlx-sam-convert command that turns Hugging Face SAM2.1 checkpoints (tiny, small, base-plus, large) into MLX safetensors, and a feature regression script that compares MLX outputs to PyTorch fixtures. The README lists low level numerical differences around 1e-5, and the model catalog section reports benchmarks on an M2 Max with 32 GB of unified memory.
Generated 2026-05-21 · Model: sonnet-4-6 · Verify against the repo before relying on details.