explaingit

yihaohu0118/seal

Analysis updated 2026-06-24

38PythonAudience · researcherComplexity · 5/5LicenseSetup · hard

TLDR

Research code for training tool-using AI agents with a closed-loop reinforcement learning method that categorizes failures and reweights GRPO rewards on the BFCL benchmark.

Mindmap

mindmap
  root((SEAL))
    Inputs
      BFCL tasks
      Tool definitions
      Failure categories
    Outputs
      Trained agent model
      Diagnostic logs
      Benchmark scores
    Use Cases
      Train tool using agents
      Reproduce paper results
      Extend reward reweighting
    Tech Stack
      Python
      Conda
      GRPO
      BFCL
Click or tap to explore — scroll the page freely

Code map

Detail Auto

An interactive map of this repo's files and how they connect — its source is parsed live in your browser. Click Visualize to build it.

filefunction / class

What do people build with it?

USE CASE 1

Reproduce the SEAL paper results on the BFCL function-calling benchmark

USE CASE 2

Train your own tool-using agent with category-aware GRPO reward reweighting

USE CASE 3

Extend the diagnostic taxonomy to cover new failure modes in agent rollouts

USE CASE 4

Plug a custom task adapter into the SEAL training loop

What is it built with?

PythonCondaGRPOBFCLPyTorch

How does it compare?

yihaohu0118/sealpower-codes/scanner-ip-cdnstg12/phantomstars
Stars383838
LanguagePythonPythonPython
Setup difficultyhardeasyeasy
Complexity5/52/53/5
Audienceresearcherops devopsops devops

Figures from each repo's GitHub metadata at analysis time.

How do you get it running?

Difficulty · hard Time to first run · 1day+

Requires two separate conda environments plus a running BFCL benchmark service and GPU compute to actually train.

Apache 2.0 lets you use, modify, and distribute commercially with attribution and a patent grant.

In plain English

SEAL stands for Synergistic Co-Evolution of Agents and Learning Environments. It is a research project that comes with a paper, a poster, and a project homepage, written by authors from Ant Group, Westlake University, the University of Michigan, and the University of Science and Technology of China. The license is Apache 2.0. The project is about making AI agents that use tools, the kind of agents that call functions, query APIs, or run commands to finish a task. The idea is that the agent and the training environment improve together in a closed loop. The agent runs through tasks, the system watches which steps fail, and the failures are sorted into categories such as invalid tool calls, wrong arguments, missed tool calls, failed recovery attempts, and responses that do not match what was expected. These labels then feed back into both the training interface and the model itself. The training method uses something called GRPO, a reinforcement learning approach, where the diagnostic categories reweight the rewards given during training. The README says the actual tool definitions, task labels, and verifier stay the same during evaluation, so the comparison to other methods remains fair. The training environment is built on BFCL, a public benchmark for function-calling agents. To run it, you clone the repo, create a Python 3.10 conda environment called seal, install the requirements, and then set up a second conda environment for the BFCL benchmark using its setup script. After both are ready, you launch the BFCL service and then start the training run with python launcher.py pointing at exp/SEAL.yaml. The repository layout is organized into folders for the experiment config, the BFCL environment service, modules for diagnostic state and reward reweighting, task adapters, and the released data splits.

Copy-paste prompts

Prompt 1
Walk me through the launcher.py training loop in SEAL and show how diagnostic categories influence the GRPO reward.
Prompt 2
Show me how to swap the BFCL environment in SEAL for a custom function-calling benchmark I have built.
Prompt 3
Help me run SEAL on a single A100 by tuning batch size and rollout settings in exp/SEAL.yaml.
Prompt 4
Explain how the failure category labels are produced and where the verifier code lives in this repo.
Prompt 5
Generate a config that fine-tunes a small open model with SEAL on only the invalid-tool-call failure subset.

Frequently asked questions

What is seal?

Research code for training tool-using AI agents with a closed-loop reinforcement learning method that categorizes failures and reweights GRPO rewards on the BFCL benchmark.

What language is seal written in?

Mainly Python. The stack also includes Python, Conda, GRPO.

What license does seal use?

Apache 2.0 lets you use, modify, and distribute commercially with attribution and a patent grant.

How hard is seal to set up?

Setup difficulty is rated hard, with roughly 1day+ to a first successful run.

Who is seal for?

Mainly researcher.

Open on GitHub → Explain another repo

This repo across BitVibe Labs

Verify against the repo before relying on details.