explaingit

qianzhong-chen/openspiral

Analysis updated 2026-05-18

18PythonAudience · researcherComplexity · 5/5Setup · hard

TLDR

Research code that trains a small residual network on top of a frozen pi-zero robot policy so the robot can improve itself through autonomous practice using dense step-by-step rewards.

Mindmap

mindmap
  root((OpenSpiral))
    What it does
      Self-improving robot policy
      Residual RL on frozen VLA
      No new human demos needed
    How it works
      Frozen base policy acts
      Residual MLP corrects action
      Critic ensemble evaluates
      Dense SARM2 rewards
    Training pipeline
      BC fine-tune pi-zero
      SARM2 reward labeling
      Residual RL update
      Autonomous rollouts
    Tech Stack
      Python and JAX
      openpi pi-zero model
      LeRobot dataset format
Click or tap to explore — scroll the page freely

Code map

Detail Auto

An interactive map of this repo's files and how they connect — its source is parsed live in your browser. Click Visualize to build it.

filefunction / class

What do people build with it?

USE CASE 1

Improve a behavior-cloned robot manipulation policy without collecting new human demonstrations, using self-supervised autonomous practice.

USE CASE 2

Train a residual correction network that patches a frozen pi-zero policy using dense per-step rewards from the SARM2 reward model.

What is it built with?

PythonJAXPyTorchopenpiLeRobot

How does it compare?

qianzhong-chen/openspiralandyuneducated/resolve-aicarriex6/cvpr2026_similarity_as_evidence
Stars181818
LanguagePythonPythonPython
Setup difficultyhardhardhard
Complexity5/54/54/5
Audienceresearcherdeveloperresearcher

Figures from each repo's GitHub metadata at analysis time.

How do you get it running?

Difficulty · hard Time to first run · 1day+

Requires 24+ GB GPU memory for training and a real robot with SARM2-labeled demonstration data.

License terms are not stated in the README, check the repository's LICENSE file.

In plain English

OpenSpiral is a research code package that implements SPIRAL, a method for making a robot manipulation policy improve itself through autonomous practice rather than requiring new human demonstrations. It is part of a research paper on self-improving robotic manipulation and builds on top of Physical Intelligence's open-source pi-zero vision-language-action model. The core idea is that training robot policies from human demonstrations is expensive because you need many high-quality examples. SPIRAL takes an existing behavior-cloned policy and adds a small secondary network called a residual policy, which is trained through reinforcement learning. Instead of backpropagating through the entire large model, only the small residual network updates. The robot collects data by practicing autonomously, and a companion reward model called SARM2 assigns a progress score to each step of each attempt. Those scores serve as the reward signal for the reinforcement learning update. At each time step, the frozen base policy produces an action and the residual network produces a small correction. The sum is the actual command sent to the robot. A set of five neural networks called a critic ensemble evaluates how good each action is, combining short-term step rewards with long-term episode returns so the system can handle multi-step tasks that take several minutes to complete. The pipeline has four stages: fine-tune the base pi-zero policy on demonstrations, use SARM2 to assign dense rewards to the data, run the residual reinforcement learning update, and collect new robot rollouts to repeat the cycle. The training requires at least 24 gigabytes of GPU memory for the residual learning step and relies on a real robot rather than a simulator. Installation uses the uv Python package manager. The repository does not include a stated license in the README.

Copy-paste prompts

Prompt 1
I have a behavior-cloned pi-zero policy and a SARM2-labeled LeRobot dataset. Walk me through running the SPIRAL residual RL training loop with train_residual_rl.py.
Prompt 2
How does the hybrid critic target in SPIRAL blend TD3 bootstrapped value learning with Monte Carlo returns for long-horizon tasks?
Prompt 3
What GPU memory is required to run SPIRAL residual RL training versus full pi-zero fine-tuning, and which GPUs are supported?

Frequently asked questions

What is openspiral?

Research code that trains a small residual network on top of a frozen pi-zero robot policy so the robot can improve itself through autonomous practice using dense step-by-step rewards.

What language is openspiral written in?

Mainly Python. The stack also includes Python, JAX, PyTorch.

What license does openspiral use?

License terms are not stated in the README, check the repository's LICENSE file.

How hard is openspiral to set up?

Setup difficulty is rated hard, with roughly 1day+ to a first successful run.

Who is openspiral for?

Mainly researcher.

Open on GitHub → Explain another repo

This repo across BitVibe Labs

Scan in gitsafehub Deploy in gitdeployhub qianzhong-chen on gitmyhub

Verify against the repo before relying on details.