Analysis updated 2026-06-24
Translate a Chinese tutorial video into English with new dubbed audio
Generate translated subtitles for an existing video for accessibility
Dub a clip using a cloned voice via F5-TTS or GPT-SoVITS
Batch translate a folder of videos on a GPU server from the command line
| jianchang512/pyvideotrans | dortania/opencore-legacy-patcher | allenai/olmocr | |
|---|---|---|---|
| Stars | 17,387 | 17,387 | 17,320 |
| Language | Python | Python | Python |
| Setup difficulty | hard | moderate | hard |
| Complexity | 4/5 | 3/5 | 4/5 |
| Audience | general | general | researcher |
Figures from each repo's GitHub metadata at analysis time.
Local models are large and benefit from a CUDA GPU, cloud translation needs API keys for DeepSeek, OpenAI, or similar.
pyVideoTrans is an open-source tool that automatically translates videos from one language to another, replacing the original speech with dubbed audio in a new language and generating translated subtitles, all in one workflow. The process works in four steps: first, it listens to the video's speech and converts it to text (ASR, or Automatic Speech Recognition), next, it translates that text into the target language using an AI language model, then it generates new spoken audio from the translated text (TTS, or Text-to-Speech), and finally it combines everything back into a finished video. You can pause and manually correct any step along the way before moving on. The tool supports a wide range of speech recognition engines, including local offline models (Faster-Whisper) and cloud services. For translation, it connects to AI models like DeepSeek, ChatGPT, Claude, Gemini, and Ollama (for fully local, offline translation). For voice generation, it supports options including Microsoft's Edge-TTS (free) and voice cloning models like F5-TTS, CosyVoice, and GPT-SoVITS, which can clone a specific person's voice style. Additional features include speaker diarization (identifying who is speaking when), multi-role dubbing (different AI voices for different speakers), vocal separation, and a command-line interface for batch processing on servers. Windows users can download a ready-to-run executable with no setup. Developers on any platform can run it from source using Python. GPU acceleration via CUDA is optional but speeds up local AI models significantly.
Desktop tool that translates videos end to end by transcribing speech, translating the text with an LLM, dubbing new audio, and merging subtitles.
Mainly Python. The stack also includes Python, Faster-Whisper, Edge-TTS.
Setup difficulty is rated hard, with roughly 1h+ to a first successful run.
Mainly general.
This repo across BitVibe Labs
Verify against the repo before relying on details.