explaingit

six-ddc/livecaption

20Python
This is a quick first-pass explanation. The richer sections — use-cases, tech stack, setup, prompts — are still being generated.

TLDR

livecaption is a command-line tool for Mac computers with Apple Silicon chips that listens to audio in real time, converts speech to text, and translates that text from English to Chinese.

Mindmap

A visual breakdown will appear here once this repo is fully enriched.

Code map

Detail Auto

An interactive map of this repo's files and how they connect — its source is parsed live in your browser. Click Visualize to build it.

filefunction / class

In plain English

livecaption is a command-line tool for Mac computers with Apple Silicon chips that listens to audio in real time, converts speech to text, and translates that text from English to Chinese. Everything runs locally on the device with no internet connection or cloud service required. The output goes to the terminal window or a text file. The tool can capture audio from a microphone, from system audio (such as the sound coming out of a Zoom or Teams call), or from both at the same time. It can also process a pre-recorded audio file. When listening to a conversation, it automatically identifies up to four different speakers and labels each line with a speaker tag like S1 or S2, so you can tell who said what. Under the hood, three separate AI models run in sequence. A speech recognition model converts spoken words into text. A speaker identification model figures out who is talking at each moment. A translation model then converts the transcribed text into Chinese. All three models run on the Mac's built-in graphics chip rather than the CPU, which keeps performance fast while the machine handles other tasks. The tool also uses a two-pass approach to accuracy: it shows a rough real-time transcript as you speak, then quietly re-processes each completed sentence to produce a cleaner final version. Setting up the tool requires a package manager called uv. On first run, it downloads the AI models automatically from Hugging Face, which totals roughly 3.5 gigabytes. Capturing system audio (meeting output rather than just the microphone) requires a small extra step: a helper program must be compiled from source, and macOS needs explicit permission granted in the Privacy settings under Screen and System Audio Recording. The README explains this permission step in detail because macOS sometimes grants it silently and incorrectly. The README is written primarily in Chinese. The description above is based on the available content.

Open on GitHub → Explain another repo

This repo across BitVibe Labs

Scan in gitsafehub Deploy in gitdeployhub six-ddc on gitmyhub

Verify against the repo before relying on details.