explaingit

weifeng2333/videocaptioner

14,530Python

TLDR

VideoCaptioner, also called 卡卡字幕助手, is a tool that takes a video and turns the spoken audio into subtitles, with the option to clean those subtitles up and translate them.

Mindmap

A visual breakdown will appear here once this repo is fully enriched.

In plain English

VideoCaptioner, also called 卡卡字幕助手, is a tool that takes a video and turns the spoken audio into subtitles, with the option to clean those subtitles up and translate them. It is written in Python and is licensed under GPL-3.0. The whole pipeline runs in one program: it transcribes the speech, splits the text into readable subtitle lines, can run those lines through a large language model to polish them, can translate them into another language, and can finally burn or attach the subtitles back into the video. It comes in two main shapes. There is a command line interface (CLI) installed with "pip install videocaptioner", and a desktop GUI installed with "pip install videocaptioner[gui]". The CLI exposes commands like transcribe, subtitle (for optimisation and translation), synthesize (which embeds subtitles into the video as soft or hard subs), process (the full pipeline), download (which can pull videos from YouTube, Bilibili, and similar sites), and config (for managing settings). There is also a Windows installer on the GitHub releases page and a one-line shell script for macOS. The README is clear that the free features work out of the box, with no API key required. Free transcription uses an engine called bijian (Bijian, also offered by 必剪) or jianying, and free translation uses Bing or Google. If a user wants the LLM polish step or higher-quality machine translation, they configure an OpenAI-compatible API. The README lists several providers that work with it, including VideoCaptioner's own paid relay, SiliconCloud, and DeepSeek, with the API base URL, key, and model name set through config commands or environment variables. Under the hood, the README says the pipeline uses word-level timestamps and voice activity detection to get accurate transcription, then asks an LLM to split the lines based on meaning rather than just pauses, and supports context-aware translation with a reflection step. The project also ships a Claude Code Skill, a small markdown file that lets AI coding assistants call VideoCaptioner directly from a chat command.

Open on GitHub → Explain another repo

Generated 2026-05-21 · Model: sonnet-4-6 · Verify against the repo before relying on details.