showlab/dream.exe

★ 21Audience · researcherComplexity · 4/5Setup · hard

Mindmap

mindmap
  root((dream.exe))
    What it does
      Converts video to robot plan
      Runs plan in simulator
      Checks task completion
    Benchmark
      101 tasks from RoboCasa
      Three difficulty levels
      Eight models tested
    Findings
      Visual quality is poor predictor
      General video models show physics understanding
    Status
      Code coming soon
      Research paper available

mindmap root((dream.exe)) What it does Converts video to robot plan Runs plan in simulator Checks task completion Benchmark 101 tasks from RoboCasa Three difficulty levels Eight models tested Findings Visual quality is poor predictor General video models show physics understanding Status Code coming soon Research paper available

Click or tap to explore — scroll the page freely

Things people build with this

USE CASE 1

Evaluate whether a video generation model produces physically plausible robot motions for a given manipulation task.

USE CASE 2

Benchmark multiple video generation models on a standardized set of 101 robot tasks across three difficulty levels.

Getting it running

Difficulty · hard Time to first run · 1day+

Repository is currently a placeholder, code, benchmark data, and evaluation tools are listed as coming soon.

In plain English

This is a research project from the Show Lab at the National University of Singapore, in collaboration with Oxford and Tencent. The central question it investigates is whether AI models that generate videos can produce videos of robots doing tasks that would actually work if a real robot tried to follow them. The typical way to judge a video generation model is by asking whether the video looks realistic. Dream.exe takes a different approach: it converts the motion shown in a generated video into an actual robot movement plan, runs that plan in a physics simulator, and checks whether the task gets completed. A video can look convincing but still fail this test if the robot movement it depicts is physically impossible or poorly timed. The project includes a benchmark of 101 tasks drawn from a robotics dataset called RoboCasa. The tasks are organized into three difficulty levels: simple single-object manipulation (pick something up, put it down), multi-object interactions where the robot needs to reason about how objects relate to each other, and multi-stage tasks that require the robot to complete several steps in the right order. Eight different video generation models were tested under this benchmark, including both open-source and closed-source systems. The findings from the paper suggest that AI models trained on general internet video already carry some understanding of physical cause and effect, since several models achieved measurable success at completing tasks despite no robot-specific training. The research also found that how visually polished a video looks is a poor indicator of whether the robot actions it depicts would actually work. At the time of this writing, the repository is a placeholder. The code, benchmark data, and evaluation tools are listed as coming soon. Only the research overview and citation information are currently present.

Copy-paste prompts

Prompt 1

Once dream.exe code is released, how would I run one of the 101 RoboCasa tasks to check whether a generated video leads to task success in the physics simulator?

Prompt 2

The dream.exe paper found that visual quality is a poor predictor of robot task success. What metrics does it use instead to evaluate whether a generated video is physically valid?

Prompt 3

I want to cite the dream.exe research in my robotics paper. What is the correct citation based on the repository information?

Open on GitHub → Explain another repo

← showlab on gitmyhub — every repo by this author, as a profile.

Verify against the repo before relying on details.