physical-intelligence/openpi

★ 11,815PythonAudience · researcherComplexity · 5/5Setup · hard

Mindmap

mindmap
  root((openpi))
    What it does
      Robot arm control
      Vision and language input
      Action command output
    Model Variants
      pi0 flow matching
      pi0-FAST autoregressive
      pi0.5 generalist
    Requirements
      NVIDIA GPU 8GB min
      Ubuntu 22.04
      Docker or uv install
    Use Cases
      Table-top manipulation
      Towel folding task
      Custom fine-tuning

mindmap root((openpi)) What it does Robot arm control Vision and language input Action command output Model Variants pi0 flow matching pi0-FAST autoregressive pi0.5 generalist Requirements NVIDIA GPU 8GB min Ubuntu 22.04 Docker or uv install Use Cases Table-top manipulation Towel folding task Custom fine-tuning

Click or tap to explore — scroll the page freely

Things people build with this

USE CASE 1

Run a pre-trained pi0 or pi0.5 model for inference on a DROID or ALOHA robot arm without additional training.

USE CASE 2

Fine-tune a base model on your own robot demonstration data using parameter-efficient LoRA to adapt it to new hardware.

USE CASE 3

Use pi0-FAST's autoregressive action tokenization for tasks where discrete action planning is preferred over flow matching.

USE CASE 4

Evaluate generalist manipulation performance of pi0.5 across environments the model was not specifically trained on.

Tech stack

PythonPyTorchCUDADockeruv

Getting it running

Difficulty · hard Time to first run · 1day+

Requires an NVIDIA GPU with at least 8 GB VRAM for inference, full fine-tuning needs 70 GB or more. Tested only on Ubuntu 22.04.

In plain English

Openpi is a Python repository from Physical Intelligence that publishes open-source AI models for controlling robots. The models in it are called vision-language-action models, which means they take camera images and text instructions as input and produce movement commands as output. The goal is to give a robot arm the ability to perform physical tasks described in plain language, such as folding a towel or unpacking a container. The repository provides three model variants. The first, pi0, uses a technique called flow matching to generate actions. The second, pi0-FAST, is an autoregressive model that uses a different approach to turn planned actions into discrete tokens. The third, pi0.5, is an updated version of pi0 with improved ability to handle environments it was not specifically trained on. All three come with base checkpoints that were pre-trained on more than 10,000 hours of recorded robot demonstrations. Beyond the base models, the repository also includes fine-tuned checkpoints for specific robot platforms and tasks, such as performing table-top manipulation on a DROID-platform robot arm or folding towels on an ALOHA robot. These fine-tuned models can be run directly for inference without further training, though the authors note that results will vary depending on how closely your robot setup matches the one used during training. Running inference requires an NVIDIA GPU with at least 8 GB of memory. Fine-tuning on your own data requires considerably more: at least 22.5 GB for the parameter-efficient LoRA approach, or 70 GB or more for full fine-tuning. The repository has been tested on Ubuntu 22.04. Dependencies are managed with a tool called uv, and Docker instructions are also provided for those who prefer a containerized setup. The project is framed as an experiment: the models were developed for Physical Intelligence's own robots, and adapting them to other hardware may or may not produce useful results.

Copy-paste prompts

Prompt 1

I have an NVIDIA GPU with 10 GB VRAM and want to run inference with the physical-intelligence/openpi pi0 model on a DROID robot arm. Walk me through installing dependencies with uv and running the inference script with a live camera feed.

Prompt 2

Help me fine-tune the pi0 model from physical-intelligence/openpi on my own robot teleoperation dataset using LoRA. What VRAM is required and what training command do I run?

Prompt 3

Write a Python script using the openpi library to send a text instruction and a camera image array to the pi0.5 model and print the resulting action commands.

Prompt 4

Set up a Docker container for running physical-intelligence/openpi inference on Ubuntu 22.04 with an NVIDIA GPU, following the repo's Docker instructions.

Prompt 5

Explain the difference between the pi0, pi0-FAST, and pi0.5 model variants in physical-intelligence/openpi and when I should choose each one for a robot manipulation task.

Open on GitHub → Explain another repo

← physical-intelligence on gitmyhub — every repo by this author, as a profile.

Verify against the repo before relying on details.