Reproduce or extend multi-agent research delegation benchmarks (BrowseComp, GAIA, xbench-DeepSearch).
Evaluate large open-source research agents against API-compatible or local 8-GPU inference endpoints.
Train custom delegation-aware agents using cleaned trajectory data and Megatron multi-node scripts.
Explore subagent orchestration architectures for long-horizon question answering tasks.
Requires 8+ GPUs for full inference mode and multi-node clusters for training. Benchmark datasets must be sourced and converted manually. Single-GPU smoke test available for environment validation.
SearchSwarm is a research project about teaching AI language models to handle complex, long-running research tasks more effectively. The core idea is training a main agent to break a large question into smaller pieces and hand those pieces off to helper agents called subagents. Each subagent works in its own isolated context, gathers relevant evidence, and returns a short, citation-backed report. The main agent then combines all those reports into a final answer without needing to hold the entire research process in memory at once. The project includes a training pipeline to teach the main agent when to delegate work, how to give subagents clear instructions, and how to verify what they return. High-quality training data was built from cleaned agent trajectories that show the delegation process step by step. The result is a 30-billion-parameter model called SearchSwarm-30B-A3B, which the authors report achieves strong results compared to other open-source research agents of similar size on benchmarks like BrowseComp, GAIA, and xbench-DeepSearch. The repository contains two main components: an evaluation framework and training scripts. The evaluation tool reads from a configuration file and supports two inference modes, one where the model is served by an external API-compatible endpoint, and one where it runs locally across eight GPU servers. Benchmark datasets are not bundled with the code, users need to obtain them from their official sources and convert them to a specific JSON format before pointing the tool at those files. Training uses ms-swift's Megatron backend. The repository offers three launch paths for multi-node setups: a Ray-based option for cloud clusters, an SSH/torchrun path for traditional clusters, and a shared-filesystem path for schedulers like Kubernetes batch jobs. A single-GPU smoke test is also provided to validate the environment before running at full scale. This repository is primarily a research artifact for people who want to reproduce the paper's results or extend the approach. It assumes access to significant GPU resources and familiarity with large-model training infrastructure.
← search-swarm on gitmyhub — every repo by this author, as a profile.
Verify against the repo before relying on details.