explaingit

existentialrobotics/serf-vla

Analysis updated 2026-05-18

12PythonAudience · researcherComplexity · 5/5Setup · hard

TLDR

Research code for training and evaluating a robot manipulation policy that uses a 4D spatiotemporal map to guide long-horizon household tasks in the BEHAVIOR-1K simulation benchmark.

Mindmap

mindmap
  root((SERF-VLA))
    What it does
      Long-horizon robot tasks
      4D spatiotemporal map
      Policy learning
    System
      PI0.5 base model
      BEHAVIOR-1K benchmark
      OmniGibson simulator
    Training
      Fine-tune per task
      H100 GPU needed
      20k training steps
    Audience
      Robotics researchers
      AI policy learners
Click or tap to explore — scroll the page freely

Code map

Detail Auto

An interactive map of this repo's files and how they connect — its source is parsed live in your browser. Click Visualize to build it.

filefunction / class

What do people build with it?

USE CASE 1

Reproduce the SERF-VLA paper results on the BEHAVIOR-1K benchmark to compare against your own robot learning approach.

USE CASE 2

Fine-tune the PI0.5 vision-language-action model on new household manipulation tasks using the provided training scripts.

USE CASE 3

Download the released SERF-VLA checkpoints and evaluate them on specific BEHAVIOR-1K tasks without retraining.

USE CASE 4

Extend the SERF policy learning code to incorporate a different mapping representation or model architecture.

What is it built with?

PythonPyTorchHugging FaceBEHAVIOR-1KOmniGibson

How does it compare?

existentialrobotics/serf-vlaaim-uofa/reasonmatcharpecop/kokobook
Stars121212
LanguagePythonPythonPython
Setup difficultyhardhardhard
Complexity5/55/53/5
Audienceresearcherresearchergeneral

Figures from each repo's GitHub metadata at analysis time.

How do you get it running?

Difficulty · hard Time to first run · 1day+

Requires BEHAVIOR-1K / OmniGibson simulator, a separate mapping repo, dataset assets, and a high-end GPU, a single evaluation task can take several days.

In plain English

SERF-VLA is the code from an academic robotics research paper about teaching a robot to complete long, multi-step household tasks in a simulated environment. The project comes from the Existential Robotics Lab and introduces a system that builds a 4D feature map of the robot's surroundings, combining where things are in space with how they change over time, and uses that map to guide the robot's decisions. The benchmark used to test the system is called BEHAVIOR-1K, a simulation environment developed at Stanford for evaluating household robots. Tasks in this benchmark include things like collecting children's toys scattered around a room. The robot must navigate a home environment, find objects, and complete multi-step manipulation tasks without resets or shortcuts. This repository contains the code for the learning part of SERF. A separate companion repository handles the mapping component. The AI model at the core is called PI0.5, a pre-trained vision-language-action model (a type of AI that takes visual input and outputs robot actions) that the authors fine-tune for specific household tasks. Pre-trained checkpoints are released and can be downloaded from Hugging Face. Setting this up is involved. It requires the BEHAVIOR-1K simulator environment, a specific dataset layout, downloaded map assets from the companion repository, and a powerful GPU (the paper used an NVIDIA H100). A single evaluation episode can take several hours, and a full 20-episode task evaluation may take days of compute time. This is academic research code aimed at robotics researchers who want to reproduce results from the paper or build on the approach in their own work.

Copy-paste prompts

Prompt 1
Using ExistentialRobotics/SERF-VLA, show me the bash commands to train a 4D environment and robot feature map policy for BEHAVIOR-1K task 0021.
Prompt 2
How do I download the SERF-VLA pretrained checkpoints from Hugging Face and set up the directory structure for evaluation?
Prompt 3
Walk me through running the SERF-VLA evaluation script for task 0026 using the 4D env-robot feature map checkpoint.
Prompt 4
What are the GPU and compute requirements for SERF-VLA training on a single BEHAVIOR-1K task at the paper's reported batch size?
Prompt 5
How do I apply the task 21 BDDL goal patch required for SERF evaluation using the provided setup script?

Frequently asked questions

What is serf-vla?

Research code for training and evaluating a robot manipulation policy that uses a 4D spatiotemporal map to guide long-horizon household tasks in the BEHAVIOR-1K simulation benchmark.

What language is serf-vla written in?

Mainly Python. The stack also includes Python, PyTorch, Hugging Face.

How hard is serf-vla to set up?

Setup difficulty is rated hard, with roughly 1day+ to a first successful run.

Who is serf-vla for?

Mainly researcher.

Open on GitHub → Explain another repo

This repo across BitVibe Labs

Scan in gitsafehub Deploy in gitdeployhub existentialrobotics on gitmyhub

Verify against the repo before relying on details.