explaingit

tianshuwu/sugar

17PythonAudience · researcherComplexity · 5/5ActiveLicenseSetup · hard

TLDR

Research code for training humanoid robots on whole-body manipulation tasks from third-person human videos, built on NVIDIA IsaacLab with six example tasks and demo checkpoints.

Mindmap

mindmap
  root((SUGAR))
    Inputs
      Human videos
      Task name
      Pretrained checkpoints
    Outputs
      Trained policy
      Inference demos
      Sim trajectories
    Use Cases
      Train humanoid policy
      Replay six demo tasks
      Reproduce paper results
    Tech Stack
      Python
      PyTorch
      IsaacSim
      IsaacLab
      CUDA

Things people build with this

USE CASE 1

Reproduce the SUGAR paper results on the six demo tasks CarryBox, KickBox, PushBox, SitChair, StandBottle, and PickBottle.

USE CASE 2

Run inference.sh on a task name with the released tracker and generator checkpoints to see a policy run in IsaacSim.

USE CASE 3

Train a new policy with train.sh for a task and an experiment name from scratch.

USE CASE 4

Reuse the sugar_rl or sugar_il packages to build on the unitree_rl_lab and DexGraspVLA stacks.

Tech stack

PythonPyTorchIsaacSimIsaacLabCUDAconda

Getting it running

Difficulty · hard Time to first run · 1day+

Setup needs IsaacSim 5.1.0, IsaacLab 2.3.0 with a flatdict pin, a recent NVIDIA GPU with CUDA, and three large Google Drive downloads.

MIT license, so you can use, modify, and redistribute the code freely as long as you keep the copyright notice.

In plain English

SUGAR is a research code release from a team at Peking University and Beihang University that trains humanoid robots to perform whole-body manipulation tasks by learning from third-person videos of humans interacting with objects. The acronym, expanded in the README, stands for a Scalable Human-Video-Driven Generalizable Humanoid Loco-Manipulation Learning Framework, and the project has an accompanying arXiv paper and a demo website. The framework is built on top of IsaacLab, a manager-based simulation framework from NVIDIA that runs inside IsaacSim. Given the human-interaction videos as input, the pipeline learns autonomous control policies for a humanoid robot that the authors describe as deployable to the real world. The current code release covers six example tasks: CarryBox, KickBox, PushBox, SitChair, StandBottle, and PickBottle. Installation is involved. A user is asked to create a conda environment with Python 3.11, install IsaacSim 5.1.0 from the NVIDIA package index, clone and check out IsaacLab at version 2.3.0 with a specific flatdict pin and then run an isaaclab.sh script to install rsl_rl, and finally install the project's own two Python packages, sugar_rl and sugar_il, in editable mode. RTX 5090 users get a separate torch 2.8.0 install line targeting the CUDA 12.8 wheels. Three data archives are downloaded from Google Drive using gdown: a 400 MB main data zip, a 50 MB descriptions zip, and a 250 MB demo checkpoints zip. After setup, the README provides two shell scripts. inference.sh takes a task name, with optional tracker and generator checkpoint paths, and runs the demo policy. train.sh takes a task name and an optional experiment name to train from scratch. The TODO list says inference checkpoints, the full training pipeline including refiner, tracker, and generator, and processed data for all six tasks are already released, while a data processing pipeline that converts RGB-D human videos into training data and a sim-to-sim transfer pipeline are still to come. The code reuses two upstream codebases acknowledged in the README: unitree_rl_lab together with beyondmimic for the sugar_rl reinforcement learning component, and DexGraspVLA for the sugar_il imitation learning component. The project is released under the MIT license.

Copy-paste prompts

Prompt 1
Set up a Python 3.11 conda environment for SUGAR with IsaacSim 5.1.0, IsaacLab 2.3.0, and the pinned flatdict version.
Prompt 2
Download the SUGAR data, descriptions, and demo checkpoints from Google Drive with gdown and place them in the expected paths.
Prompt 3
Run inference.sh on the StandBottle task with the released tracker and generator checkpoints and capture the simulation video.
Prompt 4
Install the CUDA 12.8 torch 2.8.0 wheels on an RTX 5090 box so SUGAR runs on the new GPU.
Prompt 5
Extend SUGAR with a new task that loads my own RGB-D human video and trains a tracker, even though the data processing pipeline is not yet released.
Open on GitHub → Explain another repo

Generated 2026-05-22 · Model: sonnet-4-6 · Verify against the repo before relying on details.