sailing-lab/sr2am

★ 11PythonAudience · researcherComplexity · 5/5Setup · hard

Mindmap

mindmap
  root((sr2am))
    What it does
      Efficient AI reasoning
      Meta-planning
      Token reduction
    Three internal roles
      Actor steps and acts
      Simulator plans ahead
      Meta-planner decides depth
    Models released
      8B parameter model
      30B parameter model
    Requirements
      GPU hardware
      SerpAPI key
      Code execution sandbox

mindmap root((sr2am)) What it does Efficient AI reasoning Meta-planning Token reduction Three internal roles Actor steps and acts Simulator plans ahead Meta-planner decides depth Models released 8B parameter model 30B parameter model Requirements GPU hardware SerpAPI key Code execution sandbox

Click or tap to explore — scroll the page freely

Things people build with this

USE CASE 1

Run the SR2AM 8B model on math and science benchmarks to compare reasoning efficiency against larger baseline models.

USE CASE 2

Test the meta-planning approach on web-navigation tasks to measure reasoning token reduction.

USE CASE 3

Reproduce the paper's results on your own GPU cluster using the JSONL question input format.

Tech stack

PythonPyTorchQwen3

Getting it running

Difficulty · hard Time to first run · 1day+

Requires SerpAPI, a code execution sandbox, a separate summarization model, plus a GPU with 16GB+ VRAM for the 8B model or four GPUs for the 30B model.

In plain English

SR2AM is a research project from an AI lab focused on making AI agents more efficient at reasoning through complex tasks. The core idea is that current AI systems often generate excessive thinking text before taking action, which is slow and costly. SR2AM tries to fix this by teaching an AI model to decide upfront how much planning a given task actually needs, rather than always reasoning at maximum depth. The system works by splitting the AI's process into three roles: one part handles direct, step-by-step reasoning and action, a second part mentally simulates what would happen if it took a particular action, acting like an internal planner, and a third part decides when and how much planning is worth doing before acting. This separation is all handled within a single language model's chain of thought, not three separate systems, which keeps things practical. The researchers release two models built on top of Qwen3 (a family of open AI language models): an 8-billion-parameter version that they say is competitive with models 15 to 40 times larger, and a 30-billion-parameter version that they claim matches systems in the 685 billion to 1 trillion parameter range while using 25 to 95 percent fewer reasoning tokens. These claims are benchmarked against math, science, and web-navigation tasks. To run SR2AM on your own questions, you need several external services configured: a web search API (SerpAPI by default), a code execution sandbox, a separate language model for summarizing web pages, and a GPU setup capable of running large models. The 8B model needs roughly 16GB of GPU memory, the 30B model needs four GPUs running in parallel. Input data is a JSONL file with one question per line. This is an academic release tied to a paper on arXiv. It is aimed at researchers and engineers working on AI agent systems who want to reproduce the results or test the method on their own benchmarks. The setup is nontrivial and assumes familiarity with running large language models on GPU hardware.

Copy-paste prompts

Prompt 1

I'm running SR2AM from sailing-lab. What does each field in the JSONL input format look like, and how do I configure the SerpAPI key for web search tasks?

Prompt 2

Explain the three internal roles in SR2AM, the actor, the simulator, and the meta-planner, and how they interact within a single model's chain of thought.

Prompt 3

I have 4 GPUs with 24GB VRAM each. Can I run the SR2AM 30B model, and what parallel inference configuration do I need?

Prompt 4

How does SR2AM decide how much reasoning to do before acting? Walk me through the meta-planning decision process step by step.

Open on GitHub → Explain another repo

← sailing-lab on gitmyhub — every repo by this author, as a profile.

Verify against the repo before relying on details.