hiangx-robotics/metafine

Analysis updated 2026-06-24

★ 70PythonAudience · researcherComplexity · 5/5LicenseSetup · hard

Mindmap

mindmap
  root((MetaFine))
    Inputs
      Task graph YAML
      VLA policy
      Articulated assets
      Domain randomization config
    Outputs
      Understanding score
      Perception AUC
      Behaviour smoothness metrics
    Use Cases
      Diagnose policy failures
      Compare VLA backbones
      Real-to-sim scanning
      Skill and task authoring
    Tech Stack
      Python
      SAPIEN
      ManiSkill
      LeRobot
      URDF
      YAML

mindmap root((MetaFine)) Inputs Task graph YAML VLA policy Articulated assets Domain randomization config Outputs Understanding score Perception AUC Behaviour smoothness metrics Use Cases Diagnose policy failures Compare VLA backbones Real-to-sim scanning Skill and task authoring Tech Stack Python SAPIEN ManiSkill LeRobot URDF YAML

Click or tap to explore — scroll the page freely

What do people build with it?

USE CASE 1

Diagnose where a manipulation policy fails along understanding, perception, and behaviour axes

USE CASE 2

Compare VLA backbones like pi-0.5 and OpenVLA on the same task graph

USE CASE 3

Author a new long-horizon manipulation task as a YAML graph rather than a Python environment

USE CASE 4

Scan a real object with a phone and reproduce it in simulation via the PPI workflow

What is it built with?

PythonSAPIENManiSkillLeRobotURDF

How does it compare?

	hiangx-robotics/metafine	nanovisionx/raev2	wanshuiyin/aris-in-ai-offer
Stars	70	70	71
Language	Python	Python	Python
Setup difficulty	hard	hard	easy
Complexity	5/5	5/5	2/5
Audience	researcher	researcher	researcher

Figures from each repo's GitHub metadata at analysis time.

How do you get it running?

Difficulty · hard Time to first run · 1day+

Per-policy backbones install as separate stacks, only LeRobot and StarVLA training paths and pi-0.5 closed-loop inference are verified.

MIT license, do anything with attribution and no warranty.

In plain English

MetaFine is a software framework for testing robot manipulation policies, the programs that decide how a robot arm should move to grasp, slide, insert, or otherwise handle objects. The README frames it as a diagnostic tool rather than a leaderboard. Most existing benchmarks just check whether the task ultimately succeeded, but MetaFine splits a policy's behaviour into three separate scores: understanding, perception, and behaviour. The idea is that two policies with the same overall success rate may fail in very different ways, and the three-score breakdown makes those differences visible. Understanding is measured by breaking a task into stages and reporting success on each stage, so you can see exactly where the chain breaks: engagement, manipulation, or release. Perception is measured by running domain-randomisation sweeps over lighting, camera pose, and camera rotation, and reporting the area under the success curve as a single 0-to-1 score per axis. Behaviour is measured by looking at how smooth the action trajectory was, using metrics like jerk RMS, velocity variance, and path length, which can expose jerky or hesitant policies that still happen to succeed. The platform is built from small reusable pieces. There are 21 atomic skills such as grasp, rotate, slide, and insert, each declared with a @register_skill decorator and matched to objects through a closed set of 11 affordance types. The asset library currently has more than 40 part-annotated articulated objects, each shipping a URDF file and a generated capabilities.json that declares what the object can do. Tasks are described as compositional task graphs in YAML, and the README says that adding a long-horizon task is roughly 30 lines of YAML rather than a new environment class. MetaFine runs on the SAPIEN simulator and the ManiSkill robotics environment, and supports a real-sim hybrid mode the authors call PPI: scan an object with a phone, process it, import it, and reproduce it in simulation under the same diagnostic protocol. The repository vendors seven vision-language-action policy backbones (ACT, DP3, OpenVLA, OpenVLA-OFT, pi-0, pi-0.5, and StarVLA), although the README notes that training has only been verified through the LeRobot and StarVLA paths, and closed-loop inference has only been verified for pi-0.5. The project also ships two Claude Code slash commands. /metafine_help answers questions about the platform by routing to the right section of the user guide, strictly read-only. /metafine_add walks the user through designing either a new atomic skill or a new task graph YAML, validating affordances and predicates, and writes the file only after confirmation. Installation is pip install -e ., with per-policy stacks installed separately. The license is MIT and the project is marked alpha.

Copy-paste prompts

Prompt 1

Set up MetaFine with SAPIEN and ManiSkill, then run the perception sweep on the pi-0.5 backbone for a grasp task

Prompt 2

Use the /metafine_add slash command to design a new pour-water task graph and validate its affordances

Prompt 3

Read MetaFine's behaviour metrics code and explain how jerk RMS and velocity variance are computed from a trajectory

Prompt 4

Wire a new atomic skill called push into MetaFine using the @register_skill decorator and one of the 11 affordance types

Prompt 5

Run the MetaFine PPI real-sim hybrid flow on a scanned mug and report the three-score breakdown

Frequently asked questions

What is metafine?

Diagnostic benchmark framework for robot manipulation policies that splits a task into separate understanding, perception, and behaviour scores instead of a single success rate.

What language is metafine written in?

Mainly Python. The stack also includes Python, SAPIEN, ManiSkill.

What license does metafine use?

MIT license, do anything with attribution and no warranty.

How hard is metafine to set up?

Setup difficulty is rated hard, with roughly 1day+ to a first successful run.

Who is metafine for?

Mainly researcher.

Open on GitHub → Explain another repo

This repo across BitVibe Labs

Verify against the repo before relying on details.