Analysis updated 2026-05-18
Train a neural point map from robot camera recordings so a mobile robot can locate objects by name across a large space.
Update an existing environment map using a replayed expert demonstration to teach the robot new object locations.
Visualize a spatiotemporal feature map with a 3D robot overlay to inspect what the robot has learned about its surroundings.
| existentialrobotics/serf-mapping | 16nic/comfyui-agnes-ai | 6c696e68/gpt_signup_hybrid | |
|---|---|---|---|
| Stars | 19 | 19 | 19 |
| Language | Python | Python | Python |
| Setup difficulty | hard | moderate | hard |
| Complexity | 5/5 | 2/5 | 4/5 |
| Audience | researcher | vibe coder | developer |
Figures from each repo's GitHub metadata at analysis time.
Requires conda, large DINOv3 model weights, and multi-gigabyte HDF5 datasets from Hugging Face.
SERF-mapping is the official code release for a research paper titled "SERF: Spatiotemporal Environment and Robot Feature Map for Long-Horizon Mobile Manipulation." It provides the mapping component of a system designed to help mobile robots perform long sequences of manipulation tasks, such as picking up and placing objects across a large environment over extended periods. The core idea is to build two types of maps from camera footage: one capturing the surrounding environment (furniture, objects, spatial layout) and one capturing the robot itself. These are called "neural point" maps, meaning they store learned visual features at 3D positions rather than raw color or depth values. The features come from a vision model called DINOv3, which can connect visual appearance to language, allowing the robot to find objects by their name or description. By updating these maps with expert demonstrations, a robot can learn where things are and how to find them again during a long task. The repository covers three workflows: building a SERF map from a dataset of recorded episodes, updating an existing map using a replayed expert demonstration with a tracking algorithm called CoTracker, and visualizing the resulting feature map with or without a 3D robot overlay. Datasets and pre-trained models are hosted on Hugging Face and can be downloaded with a few commands. The code also includes a Python script to extract DINOv3 visual embeddings from HDF5 data files before training. This is academic robotics research code. It depends on conda or mamba for environment setup, requires downloading large model weights and datasets from Hugging Face, and is part of a larger system. The companion repository SERF-VLA contains the planning and execution component that uses these maps at runtime. The license is specified in the LICENSE file in the repository.
Research code that builds spatial feature maps for mobile robots so they can find and manipulate objects across long tasks, using learned visual features from DINOv3.
Mainly Python. The stack also includes Python, PyTorch, DINOv3.
License terms are defined in the LICENSE file in the repository, not specified as a standard license in the README.
Setup difficulty is rated hard, with roughly 1day+ to a first successful run.
Mainly researcher.
This repo across BitVibe Labs
Verify against the repo before relying on details.