SCAIL is an AI research model for animating characters in video, given a reference character image and a motion sequence (such as a dancing or fighting motion from another video), it generates a video of that character performing the motion. This is a challenging problem because AI models often produce incorrect body rotations or fail to maintain consistent appearance across frames, especially for complex motions like turns and flips. The paper behind SCAIL, accepted at CVPR 2026, proposes two key ideas. The first is a new way to represent body pose in 3D space that is "identity agnostic", meaning it describes how the body is positioned without being tied to any specific character's appearance, which helps the model generalize to new characters. The second is a new way to feed that pose information into the AI model: instead of just giving a hint about what pose to follow, the approach shows the model full context, essentially teaching it how the motion unfolds, by using a technique called in-context learning (similar to how you might teach someone to dance by showing them the full choreography rather than describing each step). Users have applied it to animate 2D hand-drawn art, anime characters, cartoon figures, and even four-legged animals, despite the model having no specific training data for animals. The model is a 14-billion-parameter video diffusion model (a type of AI model that generates video by progressively refining noise into images). It is written in Python, and weights are available on HuggingFace. The full README is longer than what was provided.
← zai-org on gitmyhub — every repo by this author, as a profile.
Verify against the repo before relying on details.