Analysis updated 2026-05-18
Generate a lip-synced talking-head video of any portrait photo speaking a given audio track.
Create a two-speaker dialogue video where each avatar reacts and listens as the other speaks.
Run a live interactive avatar streaming session for customer support or virtual presence apps.
Deploy a self-hosted avatar API on your own GPU server instead of using a cloud service.
| avaturn-live/avtr-1 | evilsocket/audit | evolink-ai/awesome-blender-seedance-workflow-usecases | |
|---|---|---|---|
| Stars | 362 | 397 | 295 |
| Language | Python | Python | Python |
| Setup difficulty | hard | moderate | moderate |
| Complexity | 4/5 | 4/5 | 3/5 |
| Audience | developer | developer | designer |
Figures from each repo's GitHub metadata at analysis time.
Requires a Linux machine with an NVIDIA Ampere GPU or newer, CUDA 12.x, and TensorRT 10.x, plus a one-time TRT engine compilation step.
AVTR-1 is a project that lets you turn a single portrait photo into a talking, reacting avatar that can hold live conversations. You give it a picture of a person and an audio clip, and it produces a video where that person appears to speak and listen in real time, with lip movements that match the audio. The system can handle two-speaker dialogues. Feed it audio from both sides of a conversation and it will generate video of each speaker reacting appropriately, with the avatar looking engaged while the other person is talking. This is built specifically for live use, running fast enough to keep up with a real-time audio stream on modern graphics hardware. Setup requires a Linux computer with an NVIDIA graphics card from the Ampere generation or newer (RTX 3070 and above are listed as supported). You also need CUDA and TensorRT, which are NVIDIA software frameworks for running AI models on graphics cards. The installation process downloads pre-trained model weights from HuggingFace, a public AI model hosting site, then compiles them into fast inference engines specific to your hardware. This compilation step happens once and can take a while. Once installed, you can run the demo interactively or generate video files offline. The offline mode supports single-speaker lip-sync, two-speaker dialogue with both sides rendered, or idle motion without any audio. All output is standard MP4 video, and you can stitch both sides of a dialogue into a single side-by-side video using a standard video tool called ffmpeg. The project also offers a managed cloud API at avaturn.live if you want to skip the GPU setup entirely. Model weights and inference code are publicly available, a technical report and production-ready backend are listed as coming soon.
AVTR-1 turns a portrait photo and audio into a real-time lip-synced talking avatar video, with support for two-speaker dialogues and live deployment on a single NVIDIA GPU.
Mainly Python. The stack also includes Python, TensorRT, CUDA.
Setup difficulty is rated hard, with roughly 1day+ to a first successful run.
Mainly developer.
This repo across BitVibe Labs
Verify against the repo before relying on details.