ntu-ai4x/conceptseg-r1

Analysis updated 2026-06-24

★ 24PythonAudience · researcherComplexity · 5/5LicenseSetup · hard

Mindmap

mindmap
  root((ConceptSeg-R1))
    Inputs
      Example images
      Concept prompts
      Benchmark datasets
    Outputs
      Segmentation masks
      Trained 7B model
      Eval metrics
    Use Cases
      Concept segmentation research
      Zero shot Cityscapes
      Reasoning segmentation
      Few shot mask prediction
    Tech Stack
      Python
      PyTorch
      SAM 3
      Conda
      HuggingFace

mindmap root((ConceptSeg-R1)) Inputs Example images Concept prompts Benchmark datasets Outputs Segmentation masks Trained 7B model Eval metrics Use Cases Concept segmentation research Zero shot Cityscapes Reasoning segmentation Few shot mask prediction Tech Stack Python PyTorch SAM 3 Conda HuggingFace

Click or tap to explore — scroll the page freely

What do people build with it?

USE CASE 1

Reproduce the ConceptSeg-R1 paper results on the released benchmark

USE CASE 2

Fine-tune the 7B model on a new concept hierarchy of your own data

USE CASE 3

Compare Meta-GRPO against plain supervised fine-tuning on a custom segmentation task

USE CASE 4

Benchmark the Shortcut Router on latency-sensitive segmentation workloads

What is it built with?

PythonPyTorchSAM3CondaHuggingFace

How does it compare?

	ntu-ai4x/conceptseg-r1	18597990650-lab/multi-agent-game	agents365-ai/cloakfetch
Stars	24	24	24
Language	Python	Python	Python
Setup difficulty	hard	moderate	moderate
Complexity	5/5	3/5	3/5
Audience	researcher	developer	developer

Figures from each repo's GitHub metadata at analysis time.

How do you get it running?

Difficulty · hard Time to first run · 1day+

Setup needs a conda environment plus two GitHub release downloads, a modified SAM 3 package, and a GPU large enough for a 7B model with two-stage training.

Apache 2.0 license, so you can use, modify, and ship it commercially as long as you keep notices and the patent grant.

In plain English

ConceptSeg-R1 is research code from a team at Nanyang Technological University that goes with a paper posted on arXiv in May 2026. The project is about image segmentation, which is the task of teaching a computer to outline the parts of a picture that belong to a particular thing. A familiar version of segmentation is asking a model to find and trace every person or car in a photo. The team is pushing this task beyond fixed object categories toward what they call concepts, which can be more open-ended ideas such as a class of medical condition or a visual property. The authors organize concepts into a three-level hierarchy they label CI, CD, and CR, and ship a benchmark for each level. Their method, named ConceptSeg-R1, learns from a few example images that show what the user means by a concept, then applies that idea to new pictures it has not seen before. The training style is called meta reinforcement learning, and the specific algorithm they call Meta-GRPO is meant to help the model pull out a general rule from the demonstrations rather than just memorizing each example. The system pairs a multimodal language model with a frozen vision model called SAM 3. Instead of retraining SAM 3, the language model produces what the authors call latent concept tokens that are fed into SAM 3's prompt slots. There is also a Shortcut Router that decides on the fly whether a picture needs heavy reasoning or can be handled with a fast pass through SAM 3, which keeps simple cases quick. To set up the project you create a Python conda environment, download two release files from GitHub, one being a modified SAM 3 package and the other a training metadata archive, then run a setup script. Training happens in two stages: first a supervised fine-tuning step, then a GRPO reinforcement learning step, each launched by a shell script. Evaluation has separate scripts for the concept segmentation benchmark and for a reasoning segmentation benchmark, and the project page reports zero-shot results on Cityscapes and ReasonSeg. A 7-billion-parameter trained model and the ConceptSeg benchmark dataset are both available on Hugging Face. The code is released under Apache 2.0.

Copy-paste prompts

Prompt 1

Walk me through the training pipeline of ConceptSeg-R1 from SFT to the GRPO stage and what each shell script does

Prompt 2

Explain how latent concept tokens from the LLM are fed into SAM 3 prompt slots in ConceptSeg-R1

Prompt 3

Write a small script that runs ConceptSeg-R1 inference on a folder of my own images using the released 7B checkpoint

Prompt 4

Show how to add a new concept category at the CR level and evaluate the existing model on it

Prompt 5

Diff the modified SAM 3 package in ConceptSeg-R1 against upstream SAM 3 and summarize what changed

Frequently asked questions

What is conceptseg-r1?

Research code for concept-level image segmentation that pairs a multimodal LLM with a frozen SAM 3 model and trains the language head with meta reinforcement learning.

What language is conceptseg-r1 written in?

Mainly Python. The stack also includes Python, PyTorch, SAM3.

What license does conceptseg-r1 use?

Apache 2.0 license, so you can use, modify, and ship it commercially as long as you keep notices and the patent grant.

How hard is conceptseg-r1 to set up?

Setup difficulty is rated hard, with roughly 1day+ to a first successful run.

Who is conceptseg-r1 for?

Mainly researcher.

Open on GitHub → Explain another repo

This repo across BitVibe Labs

Verify against the repo before relying on details.