huanshere/videolingo

Analysis updated 2026-06-24

★ 17,026PythonAudience · vibe coderComplexity · 3/5Setup · hard

Mindmap

mindmap
  root((VideoLingo))
    Inputs
      Video file
      YouTube URL
    Outputs
      Translated subtitles
      Dubbed audio
    Use Cases
      Localize YouTube videos
      Translate lectures
      Add foreign dubbing
    Tech Stack
      Python
      WhisperX
      Streamlit
      yt-dlp

mindmap root((VideoLingo)) Inputs Video file YouTube URL Outputs Translated subtitles Dubbed audio Use Cases Localize YouTube videos Translate lectures Add foreign dubbing Tech Stack Python WhisperX Streamlit yt-dlp

Click or tap to explore — scroll the page freely

What do people build with it?

USE CASE 1

Translate and dub a YouTube tutorial into Spanish without hiring a translator.

USE CASE 2

Generate Netflix-style single-line subtitles for lecture recordings.

USE CASE 3

Clone a speaker's voice and produce a dubbed version of their video.

USE CASE 4

Batch-localize a backlog of recorded webinars into multiple languages.

What is it built with?

PythonWhisperXStreamlityt-dlpFFmpegCUDA

How does it compare?

	huanshere/videolingo	hkuds/ai-trader	pydantic/pydantic-ai
Stars	17,026	17,031	17,050
Language	Python	Python	Python
Setup difficulty	hard	hard	moderate
Complexity	3/5	4/5	3/5
Audience	vibe coder	developer	developer

Figures from each repo's GitHub metadata at analysis time.

How do you get it running?

Difficulty · hard Time to first run · 1h+

Needs FFmpeg plus an NVIDIA GPU with CUDA 12.6 and CUDNN for WhisperX, and an API key for the chosen translation/TTS provider.

In plain English

VideoLingo is an automated video translation and dubbing tool written in Python. It takes a video, either a file or a YouTube URL, and produces professionally translated subtitles and optionally a dubbed audio track in another language, aiming for the quality standard used by Netflix subtitle teams. The workflow runs end-to-end without manual intervention: it downloads the video (via yt-dlp for YouTube), transcribes the speech into word-level subtitles using WhisperX (a speech recognition model), segments those subtitles intelligently using NLP (natural language processing, software that understands sentence structure), translates the result using an AI language model, and optionally generates a dubbed audio track using AI voice synthesis. Translation uses a three-step process (translate, reflect, adapt) to handle natural phrasing and cultural context rather than literal word-for-word substitution. A key design choice is that subtitles are always single-line, the README notes this as a deliberate difference from many similar tools, which matches professional broadcast standards. Dubbing supports multiple text-to-speech providers and even voice cloning so the dubbed audio can match the original speaker's voice. The tool runs as a browser-based interface using Streamlit, so you interact with it through a web page rather than a command line. It supports multiple input languages and can translate into any language, though dubbing options depend on which voice synthesis provider is used. You would use VideoLingo to localize YouTube videos, lectures, or any video content for a different language audience without hiring a translation team.

Copy-paste prompts

Prompt 1

Walk me through installing VideoLingo on Windows with an NVIDIA GPU, including CUDA 12.6 and CUDNN setup.

Prompt 2

Help me run VideoLingo on a YouTube URL and output Chinese subtitles plus a dubbed track using Azure TTS.

Prompt 3

Show me how to plug a custom OpenAI-compatible API endpoint into VideoLingo for the translation step.

Prompt 4

Explain how the translate-reflect-adapt step works in VideoLingo and where I can tune the prompts.

Prompt 5

Help me resume a VideoLingo job that crashed halfway through subtitle segmentation.

Frequently asked questions

What is videolingo?

Automated tool that downloads a video, transcribes it with WhisperX, translates subtitles with an LLM, and optionally dubs the audio in a target language.

What language is videolingo written in?

Mainly Python. The stack also includes Python, WhisperX, Streamlit.

How hard is videolingo to set up?

Setup difficulty is rated hard, with roughly 1h+ to a first successful run.

Who is videolingo for?

Mainly vibe coder.

Open on GitHub → Explain another repo

This repo across BitVibe Labs

Scan in gitsafehub Deploy in gitdeployhub huanshere on gitmyhub

Verify against the repo before relying on details.