Analysis updated 2026-06-24
Transcribe an hour of audio to subtitles offline in under a minute on a normal laptop
Run the app on Windows, macOS, or Linux using prebuilt installers
Tweak the -150ms CTC delay offset for better subtitle alignment with the waveform
Build the project from source as a base for a custom offline transcription tool
| jochenyang/shiyu | aim-uofa/reasonmatch | arpecop/kokobook | |
|---|---|---|---|
| Stars | 12 | 12 | 12 |
| Language | Python | Python | Python |
| Setup difficulty | moderate | hard | hard |
| Complexity | 3/5 | 5/5 | 3/5 |
| Audience | general | researcher | general |
Figures from each repo's GitHub metadata at analysis time.
Building from source needs Node 18+, the stable Rust toolchain, and Python 3.10 installed together.
Shiyu Subtitle (the project's Chinese name reads "shiyu") is a desktop app that generates subtitles from audio and video files. The main selling point in the README is that all transcription happens locally on your machine, with no network connection or cloud API key needed, so your media never leaves the computer. It is built with Tauri (a lightweight desktop shell written in Rust) for the window and a Vue 3 frontend, with a Python backend doing the actual speech recognition. The speech recognition itself is done by a model called SenseVoice-Small, run through ONNX Runtime. The README claims it can compress one hour of audio transcription into less than one minute on a normal local machine. To improve the output, the app adds two features on top of the raw model: a bilingual segmentation step that breaks lines into more natural reading chunks, and a -150ms time offset that corrects a known CTC peak delay in SenseVoice so the subtitle timings line up with the audio waveform. The architecture is laid out in a small Mermaid diagram. The Tauri GUI talks over a local HTTP API to a Python backend that the Rust side starts and stops as a silent background process. The backend is packaged with PyInstaller into a single executable, so end users do not need to install Python themselves and do not see a terminal window pop up. The SenseVoice-Small model (about 230 MB) is downloaded automatically on first launch into ~/.shiyu/models/sensevoice-small/. For running the project from source, the Quick Start section lists three prerequisites: Node.js 18+, the stable Rust toolchain, and Python 3.10. You set up the backend in a virtual environment with pip install -r requirements.txt, then run the frontend with npm install followed by npm run tauri dev. The interface itself is described as a dark-mode glassmorphism design with synchronized waveform previews and timing navigation. The repo also ships a GitHub Actions workflow that builds installers for Windows (.exe), macOS (.dmg), and Linux (.deb) when you push a version tag like v1.1.0. The model files are not bundled in the installer, they are fetched at first launch. The project is MIT licensed.
Shiyu Subtitle is a Tauri plus Vue 3 desktop app that transcribes audio and video to subtitles locally using the SenseVoice-Small model via ONNX Runtime.
Mainly Python. The stack also includes Tauri, Rust, Vue.
MIT license, so you can use, modify, and ship it with almost no restrictions as long as you keep the copyright notice.
Setup difficulty is rated moderate, with roughly 30min to a first successful run.
Mainly general.
This repo across BitVibe Labs
Verify against the repo before relying on details.