Dexora is the source code release for a research robot system that the authors call a Vision-Language-Action model, or VLA. The system is designed to drive a robot with two arms and two hands that together have 36 degrees of freedom, meaning 36 independent joint angles the policy has to control. The repository accompanies a paper accepted at ICRA 2026, a major robotics conference, and includes the full training, inference, data processing, and teleoperation code. The project is built around four main pieces. The first is a way of collecting human demonstrations: an operator wears an exoskeleton backpack that captures broad arm motion, while an Apple Vision Pro headset tracks finger movement without markers. These signals drive both the real robot and a copy of it simulated in MuJoCo, a physics engine. The second piece is a dataset of those demonstrations, hosted on Hugging Face. The README lists 12.2 thousand real-world episodes covering about 40 hours of teleoperation, plus 100 thousand simulated trajectories using the same 36-joint body layout. The third piece is the training pipeline. It runs in three stages: pretrain a Diffusion Transformer policy, train a separate discriminator that scores how good each demonstration clip is, then fine-tune the policy with a weighted loss that pays less attention to low-quality demonstrations. Shell scripts launch each stage and read paths from environment variables. The setup relies on two large pretrained encoders, SigLIP for vision and T5-v1.1-XXL for language, which together take roughly 48 GB of disk. The fourth piece is the inference stack that runs on the actual robot. It splits work across three Python processes that talk over ZMQ, because the GPU policy, the AIRBOT arm SDK, and the XHAND hand SDK each need conflicting Python environments and cannot share one process. There is also a CPU-only test suite of 57 tests for quick checks. The code is released under an MIT license and the README pins specific versions of PyTorch, transformers, diffusers, accelerate, and LeRobot because newer versions break the interfaces the training stack expects.
Generated 2026-05-21 · Model: sonnet-4-6 · Verify against the repo before relying on details.