Analysis updated 2026-05-18
Reproduce the SERF-VLA paper results on the BEHAVIOR-1K benchmark to compare against your own robot learning approach.
Fine-tune the PI0.5 vision-language-action model on new household manipulation tasks using the provided training scripts.
Download the released SERF-VLA checkpoints and evaluate them on specific BEHAVIOR-1K tasks without retraining.
Extend the SERF policy learning code to incorporate a different mapping representation or model architecture.
| existentialrobotics/serf-vla | aim-uofa/reasonmatch | arpecop/kokobook | |
|---|---|---|---|
| Stars | 12 | 12 | 12 |
| Language | Python | Python | Python |
| Setup difficulty | hard | hard | hard |
| Complexity | 5/5 | 5/5 | 3/5 |
| Audience | researcher | researcher | general |
Figures from each repo's GitHub metadata at analysis time.
Requires BEHAVIOR-1K / OmniGibson simulator, a separate mapping repo, dataset assets, and a high-end GPU, a single evaluation task can take several days.
SERF-VLA is the code from an academic robotics research paper about teaching a robot to complete long, multi-step household tasks in a simulated environment. The project comes from the Existential Robotics Lab and introduces a system that builds a 4D feature map of the robot's surroundings, combining where things are in space with how they change over time, and uses that map to guide the robot's decisions. The benchmark used to test the system is called BEHAVIOR-1K, a simulation environment developed at Stanford for evaluating household robots. Tasks in this benchmark include things like collecting children's toys scattered around a room. The robot must navigate a home environment, find objects, and complete multi-step manipulation tasks without resets or shortcuts. This repository contains the code for the learning part of SERF. A separate companion repository handles the mapping component. The AI model at the core is called PI0.5, a pre-trained vision-language-action model (a type of AI that takes visual input and outputs robot actions) that the authors fine-tune for specific household tasks. Pre-trained checkpoints are released and can be downloaded from Hugging Face. Setting this up is involved. It requires the BEHAVIOR-1K simulator environment, a specific dataset layout, downloaded map assets from the companion repository, and a powerful GPU (the paper used an NVIDIA H100). A single evaluation episode can take several hours, and a full 20-episode task evaluation may take days of compute time. This is academic research code aimed at robotics researchers who want to reproduce results from the paper or build on the approach in their own work.
Research code for training and evaluating a robot manipulation policy that uses a 4D spatiotemporal map to guide long-horizon household tasks in the BEHAVIOR-1K simulation benchmark.
Mainly Python. The stack also includes Python, PyTorch, Hugging Face.
Setup difficulty is rated hard, with roughly 1day+ to a first successful run.
Mainly researcher.
This repo across BitVibe Labs
Verify against the repo before relying on details.