search-swarm/searchswarm

★ 24PythonAudience · researcherComplexity · 5/5Setup · hard

Mindmap

mindmap
  root((repo))
    Architecture
      Main agent
      Subagents
      Isolated contexts
      Citation reports
    Training
      ms-swift Megatron
      Trajectory data
      Delegation signals
    Evaluation
      BrowseComp
      GAIA
      xbench-DeepSearch
    Inference modes
      API endpoint
      Local 8 GPU
    Launch paths
      Ray cloud
      SSH torchrun
      Shared filesystem
    Model
      SearchSwarm-30B-A3B
      Open source
      30B parameters

mindmap root((repo)) Architecture Main agent Subagents Isolated contexts Citation reports Training ms-swift Megatron Trajectory data Delegation signals Evaluation BrowseComp GAIA xbench-DeepSearch Inference modes API endpoint Local 8 GPU Launch paths Ray cloud SSH torchrun Shared filesystem Model SearchSwarm-30B-A3B Open source 30B parameters

Click or tap to explore — scroll the page freely

Things people build with this

USE CASE 1

Reproduce or extend multi-agent research delegation benchmarks (BrowseComp, GAIA, xbench-DeepSearch).

USE CASE 2

Evaluate large open-source research agents against API-compatible or local 8-GPU inference endpoints.

USE CASE 3

Train custom delegation-aware agents using cleaned trajectory data and Megatron multi-node scripts.

USE CASE 4

Explore subagent orchestration architectures for long-horizon question answering tasks.

Tech stack

Pythonms-swiftMegatronRaytorchrunKubernetesvLLMJSON

Getting it running

Difficulty · hard Time to first run · 1day+

Requires 8+ GPUs for full inference mode and multi-node clusters for training. Benchmark datasets must be sourced and converted manually. Single-GPU smoke test available for environment validation.

In plain English

SearchSwarm is a research project about teaching AI language models to handle complex, long-running research tasks more effectively. The core idea is training a main agent to break a large question into smaller pieces and hand those pieces off to helper agents called subagents. Each subagent works in its own isolated context, gathers relevant evidence, and returns a short, citation-backed report. The main agent then combines all those reports into a final answer without needing to hold the entire research process in memory at once. The project includes a training pipeline to teach the main agent when to delegate work, how to give subagents clear instructions, and how to verify what they return. High-quality training data was built from cleaned agent trajectories that show the delegation process step by step. The result is a 30-billion-parameter model called SearchSwarm-30B-A3B, which the authors report achieves strong results compared to other open-source research agents of similar size on benchmarks like BrowseComp, GAIA, and xbench-DeepSearch. The repository contains two main components: an evaluation framework and training scripts. The evaluation tool reads from a configuration file and supports two inference modes, one where the model is served by an external API-compatible endpoint, and one where it runs locally across eight GPU servers. Benchmark datasets are not bundled with the code, users need to obtain them from their official sources and convert them to a specific JSON format before pointing the tool at those files. Training uses ms-swift's Megatron backend. The repository offers three launch paths for multi-node setups: a Ray-based option for cloud clusters, an SSH/torchrun path for traditional clusters, and a shared-filesystem path for schedulers like Kubernetes batch jobs. A single-GPU smoke test is also provided to validate the environment before running at full scale. This repository is primarily a research artifact for people who want to reproduce the paper's results or extend the approach. It assumes access to significant GPU resources and familiarity with large-model training infrastructure.

Copy-paste prompts

Prompt 1

How does SearchSwarm-30B-A3B decide when to delegate a subtask to a subagent versus answering directly?

Prompt 2

Walk me through the training data pipeline: how were agent trajectories cleaned and formatted to teach delegation?

Prompt 3

What benchmark datasets does SearchSwarm evaluate on, and how do I convert them to the required JSON format?

Prompt 4

Compare the three multi-node launch paths (Ray, SSH/torchrun, shared-filesystem): when should I use each?

Prompt 5

How does each subagent maintain an isolated context, and how does the main agent verify and combine their reports?

Open on GitHub → Explain another repo

← search-swarm on gitmyhub — every repo by this author, as a profile.

Verify against the repo before relying on details.