explaingit

jianchang512/pyvideotrans

Analysis updated 2026-06-24

17,387PythonAudience · generalComplexity · 4/5Setup · hard

TLDR

Desktop tool that translates videos end to end by transcribing speech, translating the text with an LLM, dubbing new audio, and merging subtitles.

Mindmap

mindmap
  root((pyvideotrans))
    Inputs
      Source video file
      Target language
      ASR engine choice
      TTS engine choice
    Outputs
      Dubbed video
      Translated subtitles
      Cloned voice tracks
    Use Cases
      Translate a tutorial into English
      Dub a YouTube clip with cloned voice
      Batch translate videos on a server
    Tech Stack
      Python
      Faster-Whisper
      Edge-TTS
      CUDA
Click or tap to explore — scroll the page freely

Code map

Detail Auto

An interactive map of this repo's files and how they connect — its source is parsed live in your browser. Click Visualize to build it.

filefunction / class

What do people build with it?

USE CASE 1

Translate a Chinese tutorial video into English with new dubbed audio

USE CASE 2

Generate translated subtitles for an existing video for accessibility

USE CASE 3

Dub a clip using a cloned voice via F5-TTS or GPT-SoVITS

USE CASE 4

Batch translate a folder of videos on a GPU server from the command line

What is it built with?

PythonFaster-WhisperEdge-TTSCUDA

How does it compare?

jianchang512/pyvideotransdortania/opencore-legacy-patcherallenai/olmocr
Stars17,38717,38717,320
LanguagePythonPythonPython
Setup difficultyhardmoderatehard
Complexity4/53/54/5
Audiencegeneralgeneralresearcher

Figures from each repo's GitHub metadata at analysis time.

How do you get it running?

Difficulty · hard Time to first run · 1h+

Local models are large and benefit from a CUDA GPU, cloud translation needs API keys for DeepSeek, OpenAI, or similar.

In plain English

pyVideoTrans is an open-source tool that automatically translates videos from one language to another, replacing the original speech with dubbed audio in a new language and generating translated subtitles, all in one workflow. The process works in four steps: first, it listens to the video's speech and converts it to text (ASR, or Automatic Speech Recognition), next, it translates that text into the target language using an AI language model, then it generates new spoken audio from the translated text (TTS, or Text-to-Speech), and finally it combines everything back into a finished video. You can pause and manually correct any step along the way before moving on. The tool supports a wide range of speech recognition engines, including local offline models (Faster-Whisper) and cloud services. For translation, it connects to AI models like DeepSeek, ChatGPT, Claude, Gemini, and Ollama (for fully local, offline translation). For voice generation, it supports options including Microsoft's Edge-TTS (free) and voice cloning models like F5-TTS, CosyVoice, and GPT-SoVITS, which can clone a specific person's voice style. Additional features include speaker diarization (identifying who is speaking when), multi-role dubbing (different AI voices for different speakers), vocal separation, and a command-line interface for batch processing on servers. Windows users can download a ready-to-run executable with no setup. Developers on any platform can run it from source using Python. GPU acceleration via CUDA is optional but speeds up local AI models significantly.

Copy-paste prompts

Prompt 1
Walk me through running pyVideoTrans on Windows to dub a Chinese MP4 into English with Edge-TTS
Prompt 2
Show me how to point pyVideoTrans at a local Ollama model for the translation step
Prompt 3
Set up pyVideoTrans with Faster-Whisper on a CUDA GPU and explain how to verify it is using the card
Prompt 4
Use the pyVideoTrans CLI to batch process a folder of videos overnight with speaker diarization on
Prompt 5
Show me how to clone a speaker voice in pyVideoTrans with GPT-SoVITS and reuse it across multiple videos

Frequently asked questions

What is pyvideotrans?

Desktop tool that translates videos end to end by transcribing speech, translating the text with an LLM, dubbing new audio, and merging subtitles.

What language is pyvideotrans written in?

Mainly Python. The stack also includes Python, Faster-Whisper, Edge-TTS.

How hard is pyvideotrans to set up?

Setup difficulty is rated hard, with roughly 1h+ to a first successful run.

Who is pyvideotrans for?

Mainly general.

Open on GitHub → Explain another repo

This repo across BitVibe Labs

Scan in gitsafehub Deploy in gitdeployhub jianchang512 on gitmyhub

Verify against the repo before relying on details.