explaingit

manycore-research/spatiallm

4,551PythonAudience · researcherComplexity · 4/5Setup · hard

TLDR

A NeurIPS 2025 research model that reads 3D point clouds from phones, depth cameras, or LiDAR and outputs a structured room map with labeled walls, doors, and furniture positions and sizes.

Mindmap

mindmap
  root((repo))
    What it does
      3D scene understanding
      Room layout detection
      Furniture labeling
    Input sources
      Phone video
      Depth cameras
      LiDAR sensors
    Tech stack
      Python
      PyTorch
      CUDA
    Use cases
      Robotics navigation
      Indoor mapping
      Fine-tuning research
Click or tap to explore — scroll the page freely

Code map

Detail Auto

An interactive map of this repo's files and how they connect — its source is parsed live in your browser. Click Visualize to build it.

filefunction / class

Things people build with this

USE CASE 1

Convert a phone video of a room into a labeled 3D map showing furniture positions and room boundaries.

USE CASE 2

Build a robotics navigation system that understands indoor spaces from depth camera or LiDAR point cloud data.

USE CASE 3

Run the model on your own indoor scans to get structured room layouts for architecture or design tools.

USE CASE 4

Fine-tune the provided pretrained models on a custom 3D scene dataset for specialized object detection.

Tech stack

PythonPyTorchCUDAPoetryLlamaQwen

Getting it running

Difficulty · hard Time to first run · 1h+

Requires Python 3.11, a CUDA-capable GPU, and the Poetry package manager, no CPU fallback for training.

In plain English

SpatialLM is a research project, accepted at NeurIPS 2025, that trains large language models to understand 3D indoor spaces. You give it a point cloud, which is a collection of 3D data points captured by a camera or sensor that represents a room or building, and it produces a structured description of that space: walls, doors, windows, and the positions and sizes of furniture items, each labeled with what it is. What makes this approach notable is that it works with point clouds from multiple sources. You can feed it data from a regular video taken on a phone (converted to a point cloud using other software), from depth cameras, or from professional LiDAR sensors. Earlier approaches to this kind of 3D scene understanding typically required specialized scanning equipment. SpatialLM lowers that barrier by accepting more common data sources. The project provides four pretrained model sizes, built on top of two different base language models (Llama 1B and Qwen 0.5B), and offers version 1.0 and version 1.1 of each. The version 1.1 models double the point cloud resolution and add a more capable point cloud encoder. They also support detection of specific object categories, so you can ask the model to only identify certain types of furniture rather than predicting all 59 supported categories. The repository includes scripts for running inference on a point cloud file, for visualizing results using a tool called Rerun, and for evaluating model performance on a provided test set. A fine-tuning guide is also included for researchers who want to train the model on their own data. Installation requires Python 3.11, PyTorch, and CUDA, and uses the Poetry package manager. A training dataset is available separately on Hugging Face. The intended applications described in the README include robotics, autonomous navigation, and general 3D scene analysis.

Copy-paste prompts

Prompt 1
I have a point cloud file from my iPhone LiDAR scanner. Help me run SpatialLM inference on it to get a list of detected furniture with positions and sizes.
Prompt 2
I want SpatialLM to detect only chairs and tables in a room scan. Show me how to use the category-specific detection feature in version 1.1.
Prompt 3
Help me set up the SpatialLM environment on Linux with CUDA: installing Python 3.11, PyTorch, and the Poetry dependencies step by step.
Prompt 4
I want to visualize SpatialLM output using Rerun. Show me how to run the visualization script and interpret the result.
Prompt 5
I have my own indoor 3D dataset. Walk me through the SpatialLM fine-tuning guide to train a custom version of the model.
Open on GitHub → Explain another repo

← manycore-research on gitmyhub — every repo by this author, as a profile.

Verify against the repo before relying on details.