explaingit

hiangx-robotics/metafine

Analysis updated 2026-06-24

70PythonAudience · researcherComplexity · 5/5LicenseSetup · hard

TLDR

Diagnostic benchmark framework for robot manipulation policies that splits a task into separate understanding, perception, and behaviour scores instead of a single success rate.

Mindmap

mindmap
  root((MetaFine))
    Inputs
      Task graph YAML
      VLA policy
      Articulated assets
      Domain randomization config
    Outputs
      Understanding score
      Perception AUC
      Behaviour smoothness metrics
    Use Cases
      Diagnose policy failures
      Compare VLA backbones
      Real-to-sim scanning
      Skill and task authoring
    Tech Stack
      Python
      SAPIEN
      ManiSkill
      LeRobot
      URDF
      YAML
Click or tap to explore — scroll the page freely

Code map

Detail Auto

An interactive map of this repo's files and how they connect — its source is parsed live in your browser. Click Visualize to build it.

filefunction / class

What do people build with it?

USE CASE 1

Diagnose where a manipulation policy fails along understanding, perception, and behaviour axes

USE CASE 2

Compare VLA backbones like pi-0.5 and OpenVLA on the same task graph

USE CASE 3

Author a new long-horizon manipulation task as a YAML graph rather than a Python environment

USE CASE 4

Scan a real object with a phone and reproduce it in simulation via the PPI workflow

What is it built with?

PythonSAPIENManiSkillLeRobotURDF

How does it compare?

hiangx-robotics/metafinenanovisionx/raev2wanshuiyin/aris-in-ai-offer
Stars707071
LanguagePythonPythonPython
Setup difficultyhardhardeasy
Complexity5/55/52/5
Audienceresearcherresearcherresearcher

Figures from each repo's GitHub metadata at analysis time.

How do you get it running?

Difficulty · hard Time to first run · 1day+

Per-policy backbones install as separate stacks, only LeRobot and StarVLA training paths and pi-0.5 closed-loop inference are verified.

MIT license, do anything with attribution and no warranty.

In plain English

MetaFine is a software framework for testing robot manipulation policies, the programs that decide how a robot arm should move to grasp, slide, insert, or otherwise handle objects. The README frames it as a diagnostic tool rather than a leaderboard. Most existing benchmarks just check whether the task ultimately succeeded, but MetaFine splits a policy's behaviour into three separate scores: understanding, perception, and behaviour. The idea is that two policies with the same overall success rate may fail in very different ways, and the three-score breakdown makes those differences visible. Understanding is measured by breaking a task into stages and reporting success on each stage, so you can see exactly where the chain breaks: engagement, manipulation, or release. Perception is measured by running domain-randomisation sweeps over lighting, camera pose, and camera rotation, and reporting the area under the success curve as a single 0-to-1 score per axis. Behaviour is measured by looking at how smooth the action trajectory was, using metrics like jerk RMS, velocity variance, and path length, which can expose jerky or hesitant policies that still happen to succeed. The platform is built from small reusable pieces. There are 21 atomic skills such as grasp, rotate, slide, and insert, each declared with a @register_skill decorator and matched to objects through a closed set of 11 affordance types. The asset library currently has more than 40 part-annotated articulated objects, each shipping a URDF file and a generated capabilities.json that declares what the object can do. Tasks are described as compositional task graphs in YAML, and the README says that adding a long-horizon task is roughly 30 lines of YAML rather than a new environment class. MetaFine runs on the SAPIEN simulator and the ManiSkill robotics environment, and supports a real-sim hybrid mode the authors call PPI: scan an object with a phone, process it, import it, and reproduce it in simulation under the same diagnostic protocol. The repository vendors seven vision-language-action policy backbones (ACT, DP3, OpenVLA, OpenVLA-OFT, pi-0, pi-0.5, and StarVLA), although the README notes that training has only been verified through the LeRobot and StarVLA paths, and closed-loop inference has only been verified for pi-0.5. The project also ships two Claude Code slash commands. /metafine_help answers questions about the platform by routing to the right section of the user guide, strictly read-only. /metafine_add walks the user through designing either a new atomic skill or a new task graph YAML, validating affordances and predicates, and writes the file only after confirmation. Installation is pip install -e ., with per-policy stacks installed separately. The license is MIT and the project is marked alpha.

Copy-paste prompts

Prompt 1
Set up MetaFine with SAPIEN and ManiSkill, then run the perception sweep on the pi-0.5 backbone for a grasp task
Prompt 2
Use the /metafine_add slash command to design a new pour-water task graph and validate its affordances
Prompt 3
Read MetaFine's behaviour metrics code and explain how jerk RMS and velocity variance are computed from a trajectory
Prompt 4
Wire a new atomic skill called push into MetaFine using the @register_skill decorator and one of the 11 affordance types
Prompt 5
Run the MetaFine PPI real-sim hybrid flow on a scanned mug and report the three-score breakdown

Frequently asked questions

What is metafine?

Diagnostic benchmark framework for robot manipulation policies that splits a task into separate understanding, perception, and behaviour scores instead of a single success rate.

What language is metafine written in?

Mainly Python. The stack also includes Python, SAPIEN, ManiSkill.

What license does metafine use?

MIT license, do anything with attribution and no warranty.

How hard is metafine to set up?

Setup difficulty is rated hard, with roughly 1day+ to a first successful run.

Who is metafine for?

Mainly researcher.

Open on GitHub → Explain another repo

This repo across BitVibe Labs

Verify against the repo before relying on details.