explaingit

dagroup-pku/physisforcing

Analysis updated 2026-05-18

74Audience · researcherComplexity · 5/5LicenseSetup · hard

TLDR

A training-time framework from Peking University and NVIDIA that makes AI-generated robot manipulation videos physically realistic by adding trajectory and relational losses to existing video generation backbones.

Mindmap

mindmap
  root((PhysisForcing))
    What It Does
      Physics-plausible video
      Robot manipulation
      Training-time plug-in
    Two Loss Functions
      Pixel trajectory loss
      Semantic relational loss
    Supported Backbones
      Wan video model
      Cosmos3-Nano
    Benchmarks
      R-Bench first place
      PAI-Bench first place
      EZS-Bench first place
    Status
      arXiv 2606.28128
      Code coming soon
      MIT license
Click or tap to explore — scroll the page freely

Code map

Detail Auto

An interactive map of this repo's files and how they connect — its source is parsed live in your browser. Click Visualize to build it.

filefunction / class

What do people build with it?

USE CASE 1

Fine-tune a video generation backbone with PhysisForcing to produce physically plausible robot manipulation videos for robotics research

USE CASE 2

Evaluate physics-plausible video generation on R-Bench, PAI-Bench, or EZS-Bench using PF-Wan or PF-Cosmos

USE CASE 3

Use a PhysisForcing-enhanced model as a world model for training robot action planners in closed-loop simulation

USE CASE 4

Study the trajectory and relational loss design to apply similar physics-reinforced training to other video generation architectures

What is it built with?

PythonPyTorchCondaWanCosmos3CoTracker3

How does it compare?

dagroup-pku/physisforcingduggasco/bc250-40cu-unlockantfu/vite-dev-rpc
Stars747475
LanguageShellTypeScript
Last pushed2026-05-01
MaintenanceMaintained
Setup difficultyhardhardmoderate
Complexity5/55/52/5
Audienceresearcherops devopsdeveloper

Figures from each repo's GitHub metadata at analysis time.

How do you get it running?

Difficulty · hard Time to first run · 1day+

Code and model weights are not yet released at time of README, GPU hardware required for training and inference when code becomes available.

MIT: use freely for any purpose, including commercial use, as long as you keep the copyright notice.

In plain English

PhysisForcing is a research framework from Peking University and NVIDIA that improves how AI video models generate robot manipulation videos. The problem it addresses is that existing video generation models often produce clips where robot arms clip through objects, fail to make proper contact, or move in ways that do not match how physical interactions actually work. PhysisForcing adds training-time supervision focused on the parts of a video where the robot is actually touching or interacting with objects, teaching the model to be more physically accurate in those regions. The technical approach adds two loss functions on top of existing video generation models rather than replacing them: one that checks where objects and robot parts move at the pixel level (a trajectory loss), and one that checks whether the spatial relationships between objects make physical sense at a higher level (a relational loss). These are applied to intermediate features inside the video model during training. Once trained, the modified model runs at inference time with no additional computational cost. The framework was tested on two existing video generation backbones called Wan and Cosmos, producing variants named PF-Wan and PF-Cosmos. On three academic benchmarks for physically accurate robotic video generation (R-Bench, PAI-Bench, and EZS-Bench), the PhysisForcing variants ranked first among all systems tested, including Veo 3.1, Sora v2 Pro, and base Cosmos models. When used as a world model to plan robot actions in a closed-loop evaluation called WorldArena, the success rate improved from 16% to 24%. At the time the README was published, code and model weights had not yet been released. The repository includes environment setup instructions and placeholders for inference and training code, with release expected within a week of the paper's June 2026 arXiv posting. The project is released under the MIT license.

Copy-paste prompts

Prompt 1
Explain the two training-time losses PhysisForcing adds. How do the pixel-level trajectory loss and the semantic relational loss differ in what they supervise?
Prompt 2
How does PhysisForcing achieve zero extra inference cost even though it modifies training with additional physics-focused losses?
Prompt 3
I want to apply PhysisForcing to a different video backbone. What components of the framework would need to be adapted based on the paper?
Prompt 4
Set up the PhysisForcing conda environment and install all dependencies. I'm running Python 3.10 with PyTorch 2.5.1 on a CUDA-enabled machine.
Prompt 5
Explain the WorldArena Action Planner benchmark that PhysisForcing was evaluated on. Why is a jump from 16% to 24% success rate considered significant?

Frequently asked questions

What is physisforcing?

A training-time framework from Peking University and NVIDIA that makes AI-generated robot manipulation videos physically realistic by adding trajectory and relational losses to existing video generation backbones.

What license does physisforcing use?

MIT: use freely for any purpose, including commercial use, as long as you keep the copyright notice.

How hard is physisforcing to set up?

Setup difficulty is rated hard, with roughly 1day+ to a first successful run.

Who is physisforcing for?

Mainly researcher.

Open on GitHub → Explain another repo

This repo across BitVibe Labs

Scan in gitsafehub Deploy in gitdeployhub dagroup-pku on gitmyhub

Verify against the repo before relying on details.