explaingit

liuzhao1225/youdub-webui

4,586PythonAudience · generalComplexity · 4/5Setup · hard

TLDR

A self-hosted tool that downloads a YouTube or Bilibili video, separates the voice from background audio, translates the speech with AI, generates new audio in the target language, and exports a dubbed video file with subtitles.

Mindmap

mindmap
  root((YouDub))
    Pipeline steps
      Download video
      Separate voice Demucs
      Transcribe Whisper
      Translate via API
      Generate TTS VoxCPM2
    Tech stack
      Python FastAPI
      Next.js frontend
      FFmpeg
      CUDA GPU
    Use cases
      Dub YouTube to Chinese
      Dub Bilibili to English
      Self-hosted translation
    Requirements
      Python 3.12
      Node.js
      NVIDIA GPU
Click or tap to explore — scroll the page freely

Code map

Detail Auto

An interactive map of this repo's files and how they connect — its source is parsed live in your browser. Click Visualize to build it.

filefunction / class

Things people build with this

USE CASE 1

Automatically dub English YouTube videos into Chinese with synchronized AI-generated voice and preserved background music.

USE CASE 2

Run a self-hosted pipeline that transcribes, translates, and re-voices video content without sending your videos to a third-party cloud service.

USE CASE 3

Resume a failed mid-way dubbing job from the exact stage it stopped, rather than starting the entire download and transcription over.

USE CASE 4

Process Bilibili videos from Chinese into English using the same pipeline in reverse.

Tech stack

PythonFastAPINext.jsFFmpegWhisperDemucsCUDANode.js

Getting it running

Difficulty · hard Time to first run · 1day+

Requires a CUDA-capable NVIDIA GPU, CPU-only machines will be far too slow for practical dubbing use.

No license information is stated in the explanation.

In plain English

YouDub WebUI is an open-source tool for dubbing videos from one language into another. You give it a YouTube or Bilibili URL, and it runs a multi-step pipeline that ends with a new video file where the original speech has been replaced by AI-generated audio in the target language, while the background music and sound effects are preserved. The README is primarily in Chinese, reflecting the tool's primary audience, though an English version is linked. The pipeline works in sequence. First the tool downloads the video. Then it separates the human voice from background audio using a model called Demucs. An AI speech recognition model (Whisper) transcribes what was said and records the exact timing of each word. Those transcripts are sent to a translation API using the same interface as OpenAI's chat models. Finally, a text-to-speech model called VoxCPM2 generates new audio in the target language, that audio is mixed with the original background track and timed to match the original speech, and the result is rendered as an mp4 with subtitles burned in. The main tested scenario is English YouTube content dubbed into Chinese, with Chinese Bilibili content dubbed into English also supported. The author runs a Bilibili channel with over 800,000 followers where every video is dubbed using this exact tool, which the README presents as evidence that it works in real production rather than just as a demonstration. The interface is a web application. A FastAPI backend runs the pipeline jobs, and a Next.js frontend lets you submit URLs, configure settings like your OpenAI API key and translation concurrency, and monitor job progress in real time. If a job fails partway through, it can resume from the failed stage rather than starting over. Everything, including downloaded videos, intermediate audio files, and final output, is stored locally on your machine. Setup requires Python 3.12, Node.js, FFmpeg, and a CUDA-capable GPU for acceptable processing speed. A proxy is needed to download YouTube videos in regions where access is restricted.

Copy-paste prompts

Prompt 1
I have YouDub WebUI running locally. Give me step-by-step instructions to dub this YouTube video into Chinese and control translation concurrency.
Prompt 2
My YouDub job failed at the TTS stage. How do I resume it without re-downloading the video or re-running Whisper transcription?
Prompt 3
I want to configure YouDub WebUI to use a custom OpenAI-compatible translation API instead of the default endpoint. Show me where to set the API base URL and key.
Prompt 4
Help me set up YouDub WebUI on Ubuntu 22.04 with a CUDA GPU, walk me through installing Python 3.12, Node.js, FFmpeg, and the CUDA dependencies in order.
Open on GitHub → Explain another repo

← liuzhao1225 on gitmyhub — every repo by this author, as a profile.

Verify against the repo before relying on details.