explaingit

ntu-ai4x/conceptseg-r1

27PythonAudience · researcherComplexity · 5/5ActiveLicenseSetup · hard

TLDR

Research code for concept-level image segmentation that pairs a multimodal LLM with a frozen SAM 3 model and trains the language head with meta reinforcement learning.

Mindmap

mindmap
  root((ConceptSeg-R1))
    Inputs
      Example images
      Concept prompts
      Benchmark datasets
    Outputs
      Segmentation masks
      Trained 7B model
      Eval metrics
    Use Cases
      Concept segmentation research
      Zero shot Cityscapes
      Reasoning segmentation
      Few shot mask prediction
    Tech Stack
      Python
      PyTorch
      SAM 3
      Conda
      HuggingFace

Things people build with this

USE CASE 1

Reproduce the ConceptSeg-R1 paper results on the released benchmark

USE CASE 2

Fine-tune the 7B model on a new concept hierarchy of your own data

USE CASE 3

Compare Meta-GRPO against plain supervised fine-tuning on a custom segmentation task

USE CASE 4

Benchmark the Shortcut Router on latency-sensitive segmentation workloads

Tech stack

PythonPyTorchSAM3CondaHuggingFace

Getting it running

Difficulty · hard Time to first run · 1day+

Setup needs a conda environment plus two GitHub release downloads, a modified SAM 3 package, and a GPU large enough for a 7B model with two-stage training.

Apache 2.0 license, so you can use, modify, and ship it commercially as long as you keep notices and the patent grant.

In plain English

ConceptSeg-R1 is research code from a team at Nanyang Technological University that goes with a paper posted on arXiv in May 2026. The project is about image segmentation, which is the task of teaching a computer to outline the parts of a picture that belong to a particular thing. A familiar version of segmentation is asking a model to find and trace every person or car in a photo. The team is pushing this task beyond fixed object categories toward what they call concepts, which can be more open-ended ideas such as a class of medical condition or a visual property. The authors organize concepts into a three-level hierarchy they label CI, CD, and CR, and ship a benchmark for each level. Their method, named ConceptSeg-R1, learns from a few example images that show what the user means by a concept, then applies that idea to new pictures it has not seen before. The training style is called meta reinforcement learning, and the specific algorithm they call Meta-GRPO is meant to help the model pull out a general rule from the demonstrations rather than just memorizing each example. The system pairs a multimodal language model with a frozen vision model called SAM 3. Instead of retraining SAM 3, the language model produces what the authors call latent concept tokens that are fed into SAM 3's prompt slots. There is also a Shortcut Router that decides on the fly whether a picture needs heavy reasoning or can be handled with a fast pass through SAM 3, which keeps simple cases quick. To set up the project you create a Python conda environment, download two release files from GitHub, one being a modified SAM 3 package and the other a training metadata archive, then run a setup script. Training happens in two stages: first a supervised fine-tuning step, then a GRPO reinforcement learning step, each launched by a shell script. Evaluation has separate scripts for the concept segmentation benchmark and for a reasoning segmentation benchmark, and the project page reports zero-shot results on Cityscapes and ReasonSeg. A 7-billion-parameter trained model and the ConceptSeg benchmark dataset are both available on Hugging Face. The code is released under Apache 2.0.

Copy-paste prompts

Prompt 1
Walk me through the training pipeline of ConceptSeg-R1 from SFT to the GRPO stage and what each shell script does
Prompt 2
Explain how latent concept tokens from the LLM are fed into SAM 3 prompt slots in ConceptSeg-R1
Prompt 3
Write a small script that runs ConceptSeg-R1 inference on a folder of my own images using the released 7B checkpoint
Prompt 4
Show how to add a new concept category at the CR level and evaluate the existing model on it
Prompt 5
Diff the modified SAM 3 package in ConceptSeg-R1 against upstream SAM 3 and summarize what changed
Open on GitHub → Explain another repo

Generated 2026-05-22 · Model: sonnet-4-6 · Verify against the repo before relying on details.