spatial-westlakenlp/actionreward

Analysis updated 2026-06-24

★ 12PythonAudience · researcherComplexity · 4/5Setup · hard

Mindmap

mindmap
  root((ActionReward))
    Inputs
      Text prompts
      Generated 3D motion
      Human ratings
    Outputs
      Alignment scores
      Benchmark dataset
      Research papers
    Use Cases
      Text-to-motion eval
      Reward modeling
      Meta-evaluation
    Tech Stack
      Python
      Video LM
      PyTorch

mindmap root((ActionReward)) Inputs Text prompts Generated 3D motion Human ratings Outputs Alignment scores Benchmark dataset Research papers Use Cases Text-to-motion eval Reward modeling Meta-evaluation Tech Stack Python Video LM PyTorch

Click or tap to explore — scroll the page freely

What do people build with it?

USE CASE 1

Score how well a generated 3D motion matches its text prompt using a video language model

USE CASE 2

Benchmark a new text-to-motion metric against human ratings on the VeMo dataset

USE CASE 3

Render 3D motion to video and run a zero-shot evaluation pipeline

USE CASE 4

Follow a research series tracking reward models for motion generation

What is it built with?

PythonPyTorchVideo-LM

How does it compare?

	spatial-westlakenlp/actionreward	aim-uofa/reasonmatch	arpecop/kokobook
Stars	12	12	12
Language	Python	Python	Python
Setup difficulty	hard	hard	hard
Complexity	4/5	5/5	3/5
Audience	researcher	researcher	general

Figures from each repo's GitHub metadata at analysis time.

How do you get it running?

Difficulty · hard Time to first run · 1day+

Top-level README has no install steps, no license, and no usage, reproducing VeMo needs the subfolder code, a video LM, and likely GPU resources.

In plain English

ActionReward is an umbrella repository from a research group at Westlake University that publishes open-source releases related to action and motion reward modeling. In plain terms, reward modeling here means building automatic graders for outputs from AI systems that produce human motion or actions, for example a model that takes a text description like 'a person waves' and generates the matching 3D body motion. Judging those outputs is hard because there is no single right answer, so the group is working on evaluation methods that try to line up better with how humans would rate the same motion. The README is short and frames the repo as a series rather than a single project. It lists four planned works, each with its own contributors. VeMo and VeMo++ are released or in progress, while VeMoRL and VeAct are marked as TODO. The contributor list is rendered with linked GitHub avatars, and the news section notes that the first paper in the series, VeMo, was accepted to the ICML 2026 conference on May 19, 2026. The one work with documentation linked at the top of the README is VeMo, full title 'Zero-Shot Text-to-Motion Evaluation using Video Language Models'. It lives in a VeMo subfolder of the repo and has its own README plus a paper PDF in an assets folder. The README's takeaways describe VeMo as a way to evaluate whether generated text-to-motion outputs actually match their prompts. It does that by rendering the produced 3D motions into ordinary videos, then asking a pretrained video-language model to score how well each video aligns with the original text prompt. Along with the method, the VeMo release includes human-annotated benchmark resources. These are meant for meta-evaluation, which means measuring how good a text-to-motion metric is by comparing its scores against human ratings. That allows other researchers to test new automatic metrics against the same human-labeled set. The top-level README does not include install instructions, dependency lists, license information, or usage examples. Anything concrete about how to run the code, what data formats are expected, or how to reproduce the paper results would need to be read from the VeMo subfolder. The other three planned works in the series, VeMo++, VeMoRL, and VeAct, have only contributor names attached and no released code or docs yet at the time the README was written.

Copy-paste prompts

Prompt 1

Walk me through setting up the VeMo subfolder of ActionReward and running its zero-shot text-to-motion evaluation on a sample motion file.

Prompt 2

Show me how to render a generated 3D motion sequence into a video that VeMo can score with a video language model.

Prompt 3

Explain the meta-evaluation protocol in VeMo and how to compare my new motion metric against the human-annotated set.

Prompt 4

Help me read the VeMo paper PDF and map its scoring pipeline to the code in the VeMo folder.

Prompt 5

Track upcoming VeMo++, VeMoRL, and VeAct releases in ActionReward and outline what each will likely add to the series.

Frequently asked questions

What is actionreward?

Umbrella repo from a Westlake University lab for action and motion reward modeling, with the first release VeMo scoring text-to-motion outputs via a video language model.

What language is actionreward written in?

Mainly Python. The stack also includes Python, PyTorch, Video-LM.

How hard is actionreward to set up?

Setup difficulty is rated hard, with roughly 1day+ to a first successful run.

Who is actionreward for?

Mainly researcher.

Open on GitHub → Explain another repo

This repo across BitVibe Labs

Verify against the repo before relying on details.