explaingit

irenegracekp/molmoact2-so101

16PythonAudience · researcherComplexity · 5/5ActiveSetup · hard

TLDR

Working example of controlling an SO-101 robot arm with natural language using AI2's MolmoAct2 vision-language-action model, no demo data required.

Mindmap

mindmap
  root((molmoact2-so101))
    Inputs
      Text prompt
      RealSense color image
      Wrist webcam frame
      Joint positions
    Outputs
      Joint target sequence
      Dry-run predictions
    Use Cases
      Pick and place
      Zero-shot manipulation
      Robotics demo
    Tech Stack
      Python
      PyTorch
      CUDA
      LeRobot
      HuggingFace

Things people build with this

USE CASE 1

Run zero-shot pick-and-place on an SO-101 arm from a natural-language instruction.

USE CASE 2

Sanity-check predicted joint targets with the dry-run mode before powering the motors.

USE CASE 3

Reproduce a robot-arm demo of MolmoAct2 on a single RTX 3080 laptop.

USE CASE 4

Calibrate Feetech servo motors and store the per-joint zero positions in the repo configs.

Tech stack

PythonPyTorchCUDALeRobotHuggingFace

Getting it running

Difficulty · hard Time to first run · 1day+

Needs the SO-101 arm, a RealSense D455 on USB 3, a wrist webcam, a recent NVIDIA GPU, and the exact LeRobot 0.5.1 pin or the arm can slam into the table on startup.

In plain English

This repository is a working example of controlling a small robot arm called SO-101 using natural-language instructions, without any training or demonstration data collected by the user. It runs MolmoAct2, a vision-language-action model recently released by AI2, the Allen Institute for AI. You type something like pick up the lemon and drop it in the red bowl, and the model takes camera images plus the current arm position and outputs the next sequence of joint movements at 30 times per second. The hardware list is short. You need an SO-101 follower arm, which is an open-source robot arm design, an Intel RealSense D455 depth camera mounted to view the workspace from the side (only its color image is used), and a regular USB webcam attached to the wrist of the arm. The README notes that the RealSense camera needs a USB 3 data cable and port to run at full speed. Setup uses Python 3.12 in a conda environment, with a single pip install -r requirements.txt that pulls in PyTorch with CUDA, version 0.5.1 of a library called LeRobot, the Feetech servo driver for the arm motors, and the HuggingFace tools. The MolmoAct2 model weights, around 15 GB, are downloaded automatically the first time you run inference. The README is very firm that the LeRobot version must be exactly 0.5.1, because the joint-angle conventions changed between versions and a mismatch can cause the arm to slam into the table on startup. Before running the model on the arm, you calibrate each motor's zero position with lerobot-calibrate and copy the result into the repo's configs folder. The main script is inference.py, taking a follower port, a wrist camera ID, and a prompt. A dry-run mode prints predicted joint targets without moving the arm so you can sanity-check the setup first. A GPU is required, and a laptop with an RTX 3080 or better is recommended.

Copy-paste prompts

Prompt 1
Set up irenegracekp/molmoact2-so101 from scratch on Ubuntu with an RTX 3080. List the conda env, the exact pip command, and where the 15 GB MolmoAct2 weights end up.
Prompt 2
Wire an Intel RealSense D455 and a USB wrist webcam to the SO-101 arm for molmoact2-so101. Show how to pick the right ports and confirm both feeds are active.
Prompt 3
Run inference.py from molmoact2-so101 in dry-run mode with the prompt 'pick up the lemon and drop it in the red bowl' and explain the printed joint targets.
Prompt 4
Calibrate every SO-101 motor with lerobot-calibrate and copy the result into the configs folder of molmoact2-so101. Walk through one motor end to end.
Prompt 5
Adapt molmoact2-so101 to a different LeRobot follower arm. List what must change in the config and why version 0.5.1 of LeRobot is pinned.
Open on GitHub → Explain another repo

Generated 2026-05-22 · Model: sonnet-4-6 · Verify against the repo before relying on details.