atinpothiraj/pqsg

Analysis updated 2026-05-18

★ 2PythonAudience · researcherComplexity · 3/5Setup · moderate

Mindmap

mindmap
  root((PQSG))
    What it does
      Score AI video physics
      Question graph pipeline
      Human correlation metric
    How it works
      Question generation
      VLM question answering
      Tree-based scoring
    Tech stack
      Python
      Gemini API
      OpenAI API
    Use cases
      Benchmark video models
      Reproduce paper results
      Score custom videos

mindmap root((PQSG)) What it does Score AI video physics Question graph pipeline Human correlation metric How it works Question generation VLM question answering Tree-based scoring Tech stack Python Gemini API OpenAI API Use cases Benchmark video models Reproduce paper results Score custom videos

Click or tap to explore — scroll the page freely

What do people build with it?

USE CASE 1

Reproduce the PQSG paper results from cached data without any API keys.

USE CASE 2

Score your own AI-generated videos for physical plausibility using Gemini or GPT-5.

USE CASE 3

Compare multiple text-to-video models on physical realism using the FinePhyEval benchmark.

What is it built with?

PythonGemini APIOpenAI APIJSON

How does it compare?

	atinpothiraj/pqsg	0-bingwu-0/live-interpreter	0xkaz/llm-governance-dashboard
Stars	2	2	2
Language	Python	Python	Python
Setup difficulty	moderate	moderate	hard
Complexity	3/5	2/5	4/5
Audience	researcher	general	ops devops

Figures from each repo's GitHub metadata at analysis time.

How do you get it running?

Difficulty · moderate Time to first run · 5min

Reproducing paper tables needs no API keys, scoring new videos requires a Google Gemini API key (OpenAI optional).

License not stated in the README.

In plain English

PQSG is a research tool for measuring how physically realistic AI-generated videos are. When you give it a text prompt and a video file, it builds a structured set of yes/no questions about that video, organized into three layers: whether the expected objects are present, whether the expected actions happen, and whether the physics looks correct. A vision-language model then answers each question by watching the video, and PQSG computes a score based on those answers. The question structure is a directed graph, meaning later questions depend on earlier ones. If a question about object presence is answered no, all downstream questions about what that object does or how it moves are automatically also marked no. This cascading logic avoids giving partial credit for videos that get the physics right but never show the right objects to begin with. You can use it in two ways. The first is to reproduce the paper results using pre-cached data, which requires no API keys and runs with a single command. The second is to score your own videos, which requires a Google Gemini API key for generating and answering questions. An OpenAI key is optional for using GPT-5 as the question-answering backend instead. The scoring correlated with human judgments at a Pearson r of about 0.47 using Gemini 2.5 Pro and around 0.48 using GPT-5.5, based on 195 videos in the FinePhyEval dataset. The tool also includes scripts for reproducing specific tables and figures from the ECCV 2026 paper. This is academic research code. The README is thorough, with clear instructions for quick reproduction and for running the pipeline on new videos. The license is not mentioned in the provided README.

Copy-paste prompts

Prompt 1

I have a folder of AI-generated video files and a list of the prompts used to create them. Show me how to format them as example_input.json for PQSG and run the evaluation.

Prompt 2

Explain how PQSG's tree-based scoring works. If a video fails the object-existence check, how does that affect the physics score?

Prompt 3

I want to use GPT-5.5 as the question-answering backend in PQSG instead of Gemini. What flags do I pass to scripts/run.py?

Prompt 4

Walk me through what the output JSON from PQSG contains for a single video, and explain what each score field means.

Frequently asked questions

What is pqsg?

PQSG is a Python tool that scores how physically realistic an AI-generated video is, by building a graph of yes/no questions and having an AI model answer them by watching the video.

What language is pqsg written in?

Mainly Python. The stack also includes Python, Gemini API, OpenAI API.

What license does pqsg use?

License not stated in the README.

How hard is pqsg to set up?

Setup difficulty is rated moderate, with roughly 5min to a first successful run.

Who is pqsg for?

Mainly researcher.

Open on GitHub → Explain another repo

This repo across BitVibe Labs

Scan in gitsafehub Deploy in gitdeployhub atinpothiraj on gitmyhub

Verify against the repo before relying on details.