explaingit

opentalker/sadtalker

13,802PythonAudience · researcherComplexity · 3/5LicenseSetup · hard

TLDR

Takes a still photo of a face and an audio clip and generates a realistic video of that person speaking in sync with the sound, using AI trained on 3D facial motion.

Mindmap

mindmap
  root((SadTalker))
    What it does
      Photo to video
      Audio lip sync
      3D face motion
    Modes
      Still mode
      Full body
      Reference style
    Tech Stack
      Python
      Gradio UI
      PyTorch
    Setup
      Anaconda install
      Pretrained weights
      Colab notebook
    Audience
      AI researchers
      Video creators
Click or tap to explore — scroll the page freely

Code map

Detail Auto

An interactive map of this repo's files and how they connect — its source is parsed live in your browser. Click Visualize to build it.

filefunction / class

Things people build with this

USE CASE 1

Generate a talking-head video from any portrait photo and a voiceover audio file.

USE CASE 2

Create lip-synced video content for social media posts using a static headshot.

USE CASE 3

Animate a full-body image to produce a speaking character video.

USE CASE 4

Try video generation experiments in Google Colab without a local GPU.

Tech stack

PythonAnacondaGradioPyTorch

Getting it running

Difficulty · hard Time to first run · 1h+

Requires downloading several GB of pre-trained model weights and setting up an Anaconda environment before any video can be generated.

Use freely for any purpose including commercial use, as long as you keep the copyright notice.

In plain English

SadTalker is a research tool that takes a single still photograph of a face and an audio recording and generates a short video of that face appearing to speak in sync with the audio. You provide one image and one audio clip, and the system produces a realistic video of the face moving and talking along with the sound. The technique was presented at CVPR 2023, a major academic computer vision conference, and was developed by researchers from Xi'an Jiaotong University, Tencent AI Lab, and Ant Group. It works by learning 3D motion coefficients from the audio and using them to animate the face in a way that follows the speech rhythm, head movements, and facial expressions implied by the sound. To use it locally, you install the project using Anaconda (a Python environment tool), download a set of pre-trained model files, and run it from the command line or through an optional browser-based interface built with Gradio. Installation guides exist for Linux, Windows, and macOS, and there is a Colab notebook so you can try it without installing anything on your own computer. A Discord server lets you send files and receive generated videos directly, which is the simplest no-setup option. Several modes are available, including one for full-body image animation rather than just face crops, a still mode that limits head movement, and a reference mode that uses a separate video to guide expression style. The project is licensed under Apache 2.0, removing the earlier non-commercial restriction. Pre-trained model weights can be downloaded from Google Drive, Baidu, or via a provided download script.

Copy-paste prompts

Prompt 1
I have a portrait photo (photo.jpg) and an audio file (speech.wav). Give me the SadTalker command to generate a talking-head video on Windows.
Prompt 2
How do I run SadTalker in still mode so the head barely moves and only the lips sync to the audio?
Prompt 3
Walk me through setting up SadTalker on Google Colab step by step, including downloading the model weights.
Prompt 4
I want to use SadTalker reference video mode to copy expression style from another video. What arguments do I pass?
Prompt 5
I installed SadTalker but the model weights are missing. Which files do I need to download and where do I put them?
Open on GitHub → Explain another repo

← opentalker on gitmyhub — every repo by this author, as a profile.

Verify against the repo before relying on details.