Analysis updated 2026-05-18
Reproduce the PQSG paper results from cached data without any API keys.
Score your own AI-generated videos for physical plausibility using Gemini or GPT-5.
Compare multiple text-to-video models on physical realism using the FinePhyEval benchmark.
| atinpothiraj/pqsg | 0-bingwu-0/live-interpreter | 0xkaz/llm-governance-dashboard | |
|---|---|---|---|
| Stars | 2 | 2 | 2 |
| Language | Python | Python | Python |
| Setup difficulty | moderate | moderate | hard |
| Complexity | 3/5 | 2/5 | 4/5 |
| Audience | researcher | general | ops devops |
Figures from each repo's GitHub metadata at analysis time.
Reproducing paper tables needs no API keys, scoring new videos requires a Google Gemini API key (OpenAI optional).
PQSG is a research tool for measuring how physically realistic AI-generated videos are. When you give it a text prompt and a video file, it builds a structured set of yes/no questions about that video, organized into three layers: whether the expected objects are present, whether the expected actions happen, and whether the physics looks correct. A vision-language model then answers each question by watching the video, and PQSG computes a score based on those answers. The question structure is a directed graph, meaning later questions depend on earlier ones. If a question about object presence is answered no, all downstream questions about what that object does or how it moves are automatically also marked no. This cascading logic avoids giving partial credit for videos that get the physics right but never show the right objects to begin with. You can use it in two ways. The first is to reproduce the paper results using pre-cached data, which requires no API keys and runs with a single command. The second is to score your own videos, which requires a Google Gemini API key for generating and answering questions. An OpenAI key is optional for using GPT-5 as the question-answering backend instead. The scoring correlated with human judgments at a Pearson r of about 0.47 using Gemini 2.5 Pro and around 0.48 using GPT-5.5, based on 195 videos in the FinePhyEval dataset. The tool also includes scripts for reproducing specific tables and figures from the ECCV 2026 paper. This is academic research code. The README is thorough, with clear instructions for quick reproduction and for running the pipeline on new videos. The license is not mentioned in the provided README.
PQSG is a Python tool that scores how physically realistic an AI-generated video is, by building a graph of yes/no questions and having an AI model answer them by watching the video.
Mainly Python. The stack also includes Python, Gemini API, OpenAI API.
License not stated in the README.
Setup difficulty is rated moderate, with roughly 5min to a first successful run.
Mainly researcher.
This repo across BitVibe Labs
Verify against the repo before relying on details.