Analysis updated 2026-05-18
Generate a video of a photo subject performing an action in a completely different visual style, such as animation.
Test how well a stylized character maintains its appearance across different scene backgrounds in generated video.
Use the subject-driven video pipeline as a baseline for research into cross-domain visual consistency.
Run inference on your own reference image by swapping the JSON config and running the provided shell script.
| hkust-c4g/domainshuttle | helpmeeadice/bandori-pet-rev | orchestration-agent/agentorchestration | |
|---|---|---|---|
| Stars | 156 | 156 | 155 |
| Language | Python | Python | Python |
| Setup difficulty | hard | moderate | hard |
| Complexity | 4/5 | 3/5 | 4/5 |
| Audience | researcher | general | ops devops |
Figures from each repo's GitHub metadata at analysis time.
Requires a CUDA-enabled GPU with sufficient VRAM for a 14B model, both Wan2.2 base and DomainShuttle checkpoints must be downloaded from HuggingFace before inference.
DomainShuttle is an AI research project from Hong Kong University of Science and Technology that generates videos from text descriptions while keeping a specific subject (like a person, object, or stylized character) looking consistent throughout, even when placed in visually different settings or art styles. The challenge this addresses is that existing text-to-video systems struggle when you want a subject from one visual domain (say, a cartoon character) to appear in a different domain (say, a photorealistic landscape), or when a real photo subject needs to appear in an animated-style video. DomainShuttle handles this by separating how a subject looks from the domain, meaning the visual style and environment of the video, learning each independently before combining them during generation. The system is built on top of Wan2.2, a 14-billion-parameter video generation model. You give it a reference image of your subject and a text prompt describing the scene or action, and it generates a short video where that subject appears in the described setting. Setup requires a CUDA-enabled GPU, the conda environment manager, and downloading two large model files from HuggingFace. After installing dependencies with a setup script, you run a single shell script to generate videos. Sample test cases are included in the repository, and you can swap in your own reference image by editing a JSON config file. The model weights and code are licensed under Apache 2.0. A technical report describing the method in full is available on arXiv.
DomainShuttle generates videos from text that keep a specific subject visually consistent across different art styles, using a subject-driven approach on top of the 14B-parameter Wan2.2 model.
Mainly Python. The stack also includes Python, PyTorch, HuggingFace.
Use freely for any purpose, including commercial, as long as you comply with the Apache 2.0 license conditions.
Setup difficulty is rated hard, with roughly 1h+ to a first successful run.
Mainly researcher.
This repo across BitVibe Labs
Verify against the repo before relying on details.