Score how well a generated 3D motion matches its text prompt using a video language model
Benchmark a new text-to-motion metric against human ratings on the VeMo dataset
Render 3D motion to video and run a zero-shot evaluation pipeline
Follow a research series tracking reward models for motion generation
Top-level README has no install steps, no license, and no usage; reproducing VeMo needs the subfolder code, a video LM, and likely GPU resources.
ActionReward is an umbrella repository from a research group at Westlake University that publishes open-source releases related to action and motion reward modeling. In plain terms, reward modeling here means building automatic graders for outputs from AI systems that produce human motion or actions, for example a model that takes a text description like 'a person waves' and generates the matching 3D body motion. Judging those outputs is hard because there is no single right answer, so the group is working on evaluation methods that try to line up better with how humans would rate the same motion. The README is short and frames the repo as a series rather than a single project. It lists four planned works, each with its own contributors. VeMo and VeMo++ are released or in progress, while VeMoRL and VeAct are marked as TODO. The contributor list is rendered with linked GitHub avatars, and the news section notes that the first paper in the series, VeMo, was accepted to the ICML 2026 conference on May 19, 2026. The one work with documentation linked at the top of the README is VeMo, full title 'Zero-Shot Text-to-Motion Evaluation using Video Language Models'. It lives in a VeMo subfolder of the repo and has its own README plus a paper PDF in an assets folder. The README's takeaways describe VeMo as a way to evaluate whether generated text-to-motion outputs actually match their prompts. It does that by rendering the produced 3D motions into ordinary videos, then asking a pretrained video-language model to score how well each video aligns with the original text prompt. Along with the method, the VeMo release includes human-annotated benchmark resources. These are meant for meta-evaluation, which means measuring how good a text-to-motion metric is by comparing its scores against human ratings. That allows other researchers to test new automatic metrics against the same human-labeled set. The top-level README does not include install instructions, dependency lists, license information, or usage examples. Anything concrete about how to run the code, what data formats are expected, or how to reproduce the paper results would need to be read from the VeMo subfolder. The other three planned works in the series, VeMo++, VeMoRL, and VeAct, have only contributor names attached and no released code or docs yet at the time the README was written.
Generated 2026-05-22 · Model: sonnet-4-6 · Verify against the repo before relying on details.