Automatically dub English YouTube videos into Chinese with synchronized AI-generated voice and preserved background music.
Run a self-hosted pipeline that transcribes, translates, and re-voices video content without sending your videos to a third-party cloud service.
Resume a failed mid-way dubbing job from the exact stage it stopped, rather than starting the entire download and transcription over.
Process Bilibili videos from Chinese into English using the same pipeline in reverse.
Requires a CUDA-capable NVIDIA GPU, CPU-only machines will be far too slow for practical dubbing use.
YouDub WebUI is an open-source tool for dubbing videos from one language into another. You give it a YouTube or Bilibili URL, and it runs a multi-step pipeline that ends with a new video file where the original speech has been replaced by AI-generated audio in the target language, while the background music and sound effects are preserved. The README is primarily in Chinese, reflecting the tool's primary audience, though an English version is linked. The pipeline works in sequence. First the tool downloads the video. Then it separates the human voice from background audio using a model called Demucs. An AI speech recognition model (Whisper) transcribes what was said and records the exact timing of each word. Those transcripts are sent to a translation API using the same interface as OpenAI's chat models. Finally, a text-to-speech model called VoxCPM2 generates new audio in the target language, that audio is mixed with the original background track and timed to match the original speech, and the result is rendered as an mp4 with subtitles burned in. The main tested scenario is English YouTube content dubbed into Chinese, with Chinese Bilibili content dubbed into English also supported. The author runs a Bilibili channel with over 800,000 followers where every video is dubbed using this exact tool, which the README presents as evidence that it works in real production rather than just as a demonstration. The interface is a web application. A FastAPI backend runs the pipeline jobs, and a Next.js frontend lets you submit URLs, configure settings like your OpenAI API key and translation concurrency, and monitor job progress in real time. If a job fails partway through, it can resume from the failed stage rather than starting over. Everything, including downloaded videos, intermediate audio files, and final output, is stored locally on your machine. Setup requires Python 3.12, Node.js, FFmpeg, and a CUDA-capable GPU for acceptable processing speed. A proxy is needed to download YouTube videos in regions where access is restricted.
← liuzhao1225 on gitmyhub — every repo by this author, as a profile.
Verify against the repo before relying on details.