explaingit

clawgym/clawgym-agents

11PythonAudience · researcherComplexity · 5/5ActiveSetup · hard

TLDR

Research code release for ClawGym, with pointers to Hugging Face datasets and 4B/8B agent models plus SFT and RL training folders.

Mindmap

mindmap
  root((ClawGym-Agents))
    Inputs
      ClawGym-Task dataset
      ClawGym-Trajectory dataset
    Outputs
      ClawGym-4B model
      ClawGym-8B model
      ClawGym-30A3 model
    Use Cases
      Train claw agents
      Reproduce paper results
      Fine-tune on tasks
    Tech Stack
      Python
      Hugging Face
      PyTorch
      RL

Things people build with this

USE CASE 1

Download the ClawGym-Task and ClawGym-Trajectory datasets from Hugging Face to study agent task data

USE CASE 2

Run supervised fine-tuning on the ClawGym base models using the SFT folder

USE CASE 3

Train a claw agent with reinforcement learning using the RL folder

USE CASE 4

Reproduce the ClawGym paper results with the released 4B, 8B, and 30A3 checkpoints

Tech stack

PythonPyTorchHuggingFace

Getting it running

Difficulty · hard Time to first run · 1day+

README is mostly links to Hugging Face and the paper, so training requires GPU infra and reading the SFT and RL folders directly.

In plain English

ClawGym-Agents is the public-facing piece of a research project from a group called RUC-AIBOX. The README itself is very short, and most of it is a set of links rather than a long explanation. The repository pairs with a research paper titled ClawGym: A Scalable Framework for Building Effective Claw Agents, listed as a 2026 arXiv preprint with Bai, Song, Sun, and several other authors. The README points to two datasets that the team has published on Hugging Face. The first is called ClawGym-Task and contains around 13,500 tasks. The second is called ClawGym-Trajectory and contains around 24,500 trajectories. The word trajectory in this kind of work usually means a recorded sequence of actions an agent took while attempting a task, so the two datasets line up: one set of problems to solve, one set of recorded attempts. The README also lists three trained models, all hosted on Hugging Face. ClawGym-4B and ClawGym-8B are named after their size, with four billion and eight billion parameters respectively. ClawGym-30A3 is a third variant whose naming the README does not explain. The repository is set up so that anyone can download the data and the models from Hugging Face by following the links. The training code for the models is split into two folders inside this repository. One folder is named SFT, which is short for supervised fine-tuning, and the other is named RL, which is short for reinforcement learning. The README only points at these folders without describing the contents. Beyond the dataset table, the model table, the training code pointer, and the BibTeX citation block, the README does not say anything about what a claw agent actually does, how the data was collected, what task format is used, or how the models compare. Anyone who wants more detail will need to read the linked paper or open the SFT and RL folders directly.

Copy-paste prompts

Prompt 1
Walk me through the SFT folder of ClawGym-Agents and show what scripts run supervised fine-tuning
Prompt 2
Pull the ClawGym-Task dataset from Hugging Face and write a Python loader that yields one task at a time
Prompt 3
Compare the ClawGym-4B and ClawGym-8B model configs and tell me which to start with for a 24GB GPU
Prompt 4
Set up a minimal RL training loop using the RL folder against the ClawGym-Trajectory dataset
Prompt 5
Read the ClawGym arXiv preprint and summarize what a claw agent is and how trajectories are recorded
Open on GitHub → Explain another repo

Generated 2026-05-22 · Model: sonnet-4-6 · Verify against the repo before relying on details.