Analysis updated 2026-05-18
Fine-tune a video generation backbone with PhysisForcing to produce physically plausible robot manipulation videos for robotics research
Evaluate physics-plausible video generation on R-Bench, PAI-Bench, or EZS-Bench using PF-Wan or PF-Cosmos
Use a PhysisForcing-enhanced model as a world model for training robot action planners in closed-loop simulation
Study the trajectory and relational loss design to apply similar physics-reinforced training to other video generation architectures
| dagroup-pku/physisforcing | duggasco/bc250-40cu-unlock | antfu/vite-dev-rpc | |
|---|---|---|---|
| Stars | 74 | 74 | 75 |
| Language | — | Shell | TypeScript |
| Last pushed | — | — | 2026-05-01 |
| Maintenance | — | — | Maintained |
| Setup difficulty | hard | hard | moderate |
| Complexity | 5/5 | 5/5 | 2/5 |
| Audience | researcher | ops devops | developer |
Figures from each repo's GitHub metadata at analysis time.
Code and model weights are not yet released at time of README, GPU hardware required for training and inference when code becomes available.
PhysisForcing is a research framework from Peking University and NVIDIA that improves how AI video models generate robot manipulation videos. The problem it addresses is that existing video generation models often produce clips where robot arms clip through objects, fail to make proper contact, or move in ways that do not match how physical interactions actually work. PhysisForcing adds training-time supervision focused on the parts of a video where the robot is actually touching or interacting with objects, teaching the model to be more physically accurate in those regions. The technical approach adds two loss functions on top of existing video generation models rather than replacing them: one that checks where objects and robot parts move at the pixel level (a trajectory loss), and one that checks whether the spatial relationships between objects make physical sense at a higher level (a relational loss). These are applied to intermediate features inside the video model during training. Once trained, the modified model runs at inference time with no additional computational cost. The framework was tested on two existing video generation backbones called Wan and Cosmos, producing variants named PF-Wan and PF-Cosmos. On three academic benchmarks for physically accurate robotic video generation (R-Bench, PAI-Bench, and EZS-Bench), the PhysisForcing variants ranked first among all systems tested, including Veo 3.1, Sora v2 Pro, and base Cosmos models. When used as a world model to plan robot actions in a closed-loop evaluation called WorldArena, the success rate improved from 16% to 24%. At the time the README was published, code and model weights had not yet been released. The repository includes environment setup instructions and placeholders for inference and training code, with release expected within a week of the paper's June 2026 arXiv posting. The project is released under the MIT license.
A training-time framework from Peking University and NVIDIA that makes AI-generated robot manipulation videos physically realistic by adding trajectory and relational losses to existing video generation backbones.
MIT: use freely for any purpose, including commercial use, as long as you keep the copyright notice.
Setup difficulty is rated hard, with roughly 1day+ to a first successful run.
Mainly researcher.
This repo across BitVibe Labs
Verify against the repo before relying on details.