Reproduce the ConceptSeg-R1 paper results on the released benchmark
Fine-tune the 7B model on a new concept hierarchy of your own data
Compare Meta-GRPO against plain supervised fine-tuning on a custom segmentation task
Benchmark the Shortcut Router on latency-sensitive segmentation workloads
Setup needs a conda environment plus two GitHub release downloads, a modified SAM 3 package, and a GPU large enough for a 7B model with two-stage training.
ConceptSeg-R1 is research code from a team at Nanyang Technological University that goes with a paper posted on arXiv in May 2026. The project is about image segmentation, which is the task of teaching a computer to outline the parts of a picture that belong to a particular thing. A familiar version of segmentation is asking a model to find and trace every person or car in a photo. The team is pushing this task beyond fixed object categories toward what they call concepts, which can be more open-ended ideas such as a class of medical condition or a visual property. The authors organize concepts into a three-level hierarchy they label CI, CD, and CR, and ship a benchmark for each level. Their method, named ConceptSeg-R1, learns from a few example images that show what the user means by a concept, then applies that idea to new pictures it has not seen before. The training style is called meta reinforcement learning, and the specific algorithm they call Meta-GRPO is meant to help the model pull out a general rule from the demonstrations rather than just memorizing each example. The system pairs a multimodal language model with a frozen vision model called SAM 3. Instead of retraining SAM 3, the language model produces what the authors call latent concept tokens that are fed into SAM 3's prompt slots. There is also a Shortcut Router that decides on the fly whether a picture needs heavy reasoning or can be handled with a fast pass through SAM 3, which keeps simple cases quick. To set up the project you create a Python conda environment, download two release files from GitHub, one being a modified SAM 3 package and the other a training metadata archive, then run a setup script. Training happens in two stages: first a supervised fine-tuning step, then a GRPO reinforcement learning step, each launched by a shell script. Evaluation has separate scripts for the concept segmentation benchmark and for a reasoning segmentation benchmark, and the project page reports zero-shot results on Cityscapes and ReasonSeg. A 7-billion-parameter trained model and the ConceptSeg benchmark dataset are both available on Hugging Face. The code is released under Apache 2.0.
Generated 2026-05-22 · Model: sonnet-4-6 · Verify against the repo before relying on details.