explaingit

wan-video/wan2.2

15,713PythonAudience · researcherComplexity · 4/5Setup · hard

TLDR

Wan2.2 is an open-source AI model that generates short video clips from a text description or a starting image, running on consumer GPUs and supporting audio-driven and character animation variants.

Mindmap

mindmap
  root((Wan2.2))
    What it does
      Text to video
      Image to video
      Audio-driven video
      Character animation
    Architecture
      Mixture of Experts
      5B parameter model
      720P at 24 fps
    Integrations
      ComfyUI
      Diffusers
      Consumer GPU
    Use cases
      Creative video projects
      AI research
      Video tool building
Click or tap to explore — scroll the page freely

Code map

Detail Auto

An interactive map of this repo's files and how they connect — its source is parsed live in your browser. Click Visualize to build it.

filefunction / class

Things people build with this

USE CASE 1

Generate short video clips from text prompts for creative projects without relying on a paid commercial video AI service.

USE CASE 2

Animate a still image into a short video clip using the image-to-video model on a local consumer GPU.

USE CASE 3

Create audio-driven cinematic video from a speech recording using the specialized audio-driven model.

USE CASE 4

Build a custom AI video generation pipeline by integrating Wan2.2 with ComfyUI or the Diffusers library.

Tech stack

PythonPyTorchComfyUIDiffusersCUDA

Getting it running

Difficulty · hard Time to first run · 1h+

Requires a CUDA-capable GPU, setup involves downloading large model weight files before any generation can run.

In plain English

Wan2.2 is an open-source AI system that generates videos from text descriptions or still images. You type what you want to see, or provide a starting image, and the model produces a short video clip. It is written in Python and released by Wan-AI. The 2.2 version introduces several improvements over earlier releases. It uses a "Mixture-of-Experts" (MoE) architecture, a design where different specialist sub-models handle different parts of the video generation process, increasing capability without proportionally increasing computing cost. The model was trained on a substantially larger dataset than its predecessor, with about 65% more images and 83% more videos, improving the realism and variety of motion. It can generate video at 720P resolution at 24 frames per second, and the 5B (five-billion parameter) version of the model is designed to run on consumer graphics cards. Beyond basic text-to-video and image-to-video, the project includes specialized models: one for audio-driven video (generating cinematic video from a speech recording), and one for character animation (replicating a person's movement and expressions from reference footage). You would use this if you want to generate video content from text prompts or images without relying on a commercial service, for creative projects, research, or building AI-powered video tools. The model integrates with popular AI toolkits including ComfyUI and Diffusers. The full README is longer than what was provided.

Copy-paste prompts

Prompt 1
Using Wan2.2 with the Diffusers library, show me how to generate a 3-second 720P video clip from the text prompt 'a cat walking in a garden at sunset'.
Prompt 2
How do I install and run the Wan2.2 image-to-video model on a consumer GPU to animate a local JPEG image?
Prompt 3
Show me how to use the Wan2.2 audio-driven video model to generate a talking-head video from an MP3 speech file and a portrait image.
Prompt 4
How do I set up a Wan2.2 text-to-video node inside a ComfyUI workflow?
Open on GitHub → Explain another repo

← wan-video on gitmyhub — every repo by this author, as a profile.

Verify against the repo before relying on details.