weifeng2333/videocaptioner

Analysis updated 2026-06-24

★ 14,530PythonAudience · generalComplexity · 2/5LicenseSetup · easy

Mindmap

mindmap
  root((VideoCaptioner))
    Inputs
      Video files
      YouTube URLs
      Bilibili URLs
    Outputs
      SRT subtitles
      Translated subtitles
      Video with burned subs
    Use Cases
      Auto-subtitle creators
      Translate foreign videos
      Polish raw transcripts
    Tech Stack
      Python
      LLM API
      VAD
      PyQt

mindmap root((VideoCaptioner)) Inputs Video files YouTube URLs Bilibili URLs Outputs SRT subtitles Translated subtitles Video with burned subs Use Cases Auto-subtitle creators Translate foreign videos Polish raw transcripts Tech Stack Python LLM API VAD PyQt

Click or tap to explore — scroll the page freely

What do people build with it?

USE CASE 1

Auto-generate SRT subtitles for a YouTube video without paying for any API.

USE CASE 2

Translate Chinese podcast videos into English subtitles using an LLM for natural phrasing.

USE CASE 3

Batch-process a folder of recordings into hard-burned subtitled MP4s with one CLI command.

USE CASE 4

Polish raw Whisper transcripts so subtitle lines break on meaning instead of pauses.

What is it built with?

PythonOpenAI APIWhisperPyQt

How does it compare?

	weifeng2333/videocaptioner	blinkdl/rwkv-lm	swivid/f5-tts
Stars	14,530	14,524	14,508
Language	Python	Python	Python
Last pushed	—	2026-05-08	—
Maintenance	—	Maintained	—
Setup difficulty	easy	hard	hard
Complexity	2/5	5/5	4/5
Audience	general	researcher	developer

Figures from each repo's GitHub metadata at analysis time.

How do you get it running?

Difficulty · easy Time to first run · 30min

Free features work with no API key, but the LLM polish and high-quality translation steps need an OpenAI-compatible endpoint configured.

GPL-3.0, you can use, modify, and share it freely, but any project that includes its code must also be open-sourced under GPL.

In plain English

VideoCaptioner, also called 卡卡字幕助手, is a tool that takes a video and turns the spoken audio into subtitles, with the option to clean those subtitles up and translate them. It is written in Python and is licensed under GPL-3.0. The whole pipeline runs in one program: it transcribes the speech, splits the text into readable subtitle lines, can run those lines through a large language model to polish them, can translate them into another language, and can finally burn or attach the subtitles back into the video. It comes in two main shapes. There is a command line interface (CLI) installed with "pip install videocaptioner", and a desktop GUI installed with "pip install videocaptioner[gui]". The CLI exposes commands like transcribe, subtitle (for optimisation and translation), synthesize (which embeds subtitles into the video as soft or hard subs), process (the full pipeline), download (which can pull videos from YouTube, Bilibili, and similar sites), and config (for managing settings). There is also a Windows installer on the GitHub releases page and a one-line shell script for macOS. The README is clear that the free features work out of the box, with no API key required. Free transcription uses an engine called bijian (Bijian, also offered by 必剪) or jianying, and free translation uses Bing or Google. If a user wants the LLM polish step or higher-quality machine translation, they configure an OpenAI-compatible API. The README lists several providers that work with it, including VideoCaptioner's own paid relay, SiliconCloud, and DeepSeek, with the API base URL, key, and model name set through config commands or environment variables. Under the hood, the README says the pipeline uses word-level timestamps and voice activity detection to get accurate transcription, then asks an LLM to split the lines based on meaning rather than just pauses, and supports context-aware translation with a reflection step. The project also ships a Claude Code Skill, a small markdown file that lets AI coding assistants call VideoCaptioner directly from a chat command.

Copy-paste prompts

Prompt 1

Install VideoCaptioner with pip and write the one-line CLI command to transcribe interview.mp4 into Chinese SRT subtitles using the free bijian engine.

Prompt 2

Configure VideoCaptioner to use DeepSeek as the LLM for the polish step. Show me the exact config commands and the env vars.

Prompt 3

Write a bash script that downloads a YouTube playlist with VideoCaptioner's download command, then runs the full process pipeline to burn English subtitles on each video.

Prompt 4

Compare VideoCaptioner vs Whisper.cpp vs Subtitle Edit for a Mac user who wants to add Chinese subs to family videos.

Prompt 5

Set up the VideoCaptioner Claude Code Skill so I can subtitle a video by chatting 'subtitle ~/Movies/lecture.mp4' inside Claude Code.

Frequently asked questions

What is videocaptioner?

Python tool that transcribes video audio into subtitles, polishes them with an LLM, translates them, and burns them back into the video. Has both a CLI and a desktop GUI.

What language is videocaptioner written in?

Mainly Python. The stack also includes Python, OpenAI API, Whisper.

What license does videocaptioner use?

GPL-3.0, you can use, modify, and share it freely, but any project that includes its code must also be open-sourced under GPL.

How hard is videocaptioner to set up?

Setup difficulty is rated easy, with roughly 30min to a first successful run.

Who is videocaptioner for?

Mainly general.

Open on GitHub → Explain another repo

This repo across BitVibe Labs

Verify against the repo before relying on details.