jochenyang/shiyu

Analysis updated 2026-06-24

★ 12PythonAudience · generalComplexity · 3/5LicenseSetup · moderate

Mindmap

mindmap
  root((Shiyu))
    Inputs
      Audio file
      Video file
    Outputs
      Subtitle file
      Bilingual segments
      Waveform preview
    Use Cases
      Offline subtitling
      Privacy-safe transcription
      Cross-platform installers
    Tech Stack
      Tauri
      Rust
      Vue 3
      Python
      ONNX Runtime
      SenseVoice

mindmap root((Shiyu)) Inputs Audio file Video file Outputs Subtitle file Bilingual segments Waveform preview Use Cases Offline subtitling Privacy-safe transcription Cross-platform installers Tech Stack Tauri Rust Vue 3 Python ONNX Runtime SenseVoice

Click or tap to explore — scroll the page freely

What do people build with it?

USE CASE 1

Transcribe an hour of audio to subtitles offline in under a minute on a normal laptop

USE CASE 2

Run the app on Windows, macOS, or Linux using prebuilt installers

USE CASE 3

Tweak the -150ms CTC delay offset for better subtitle alignment with the waveform

USE CASE 4

Build the project from source as a base for a custom offline transcription tool

What is it built with?

TauriRustVuePythonONNX

How does it compare?

	jochenyang/shiyu	aim-uofa/reasonmatch	arpecop/kokobook
Stars	12	12	12
Language	Python	Python	Python
Setup difficulty	moderate	hard	hard
Complexity	3/5	5/5	3/5
Audience	general	researcher	general

Figures from each repo's GitHub metadata at analysis time.

How do you get it running?

Difficulty · moderate Time to first run · 30min

Building from source needs Node 18+, the stable Rust toolchain, and Python 3.10 installed together.

MIT license, so you can use, modify, and ship it with almost no restrictions as long as you keep the copyright notice.

In plain English

Shiyu Subtitle (the project's Chinese name reads "shiyu") is a desktop app that generates subtitles from audio and video files. The main selling point in the README is that all transcription happens locally on your machine, with no network connection or cloud API key needed, so your media never leaves the computer. It is built with Tauri (a lightweight desktop shell written in Rust) for the window and a Vue 3 frontend, with a Python backend doing the actual speech recognition. The speech recognition itself is done by a model called SenseVoice-Small, run through ONNX Runtime. The README claims it can compress one hour of audio transcription into less than one minute on a normal local machine. To improve the output, the app adds two features on top of the raw model: a bilingual segmentation step that breaks lines into more natural reading chunks, and a -150ms time offset that corrects a known CTC peak delay in SenseVoice so the subtitle timings line up with the audio waveform. The architecture is laid out in a small Mermaid diagram. The Tauri GUI talks over a local HTTP API to a Python backend that the Rust side starts and stops as a silent background process. The backend is packaged with PyInstaller into a single executable, so end users do not need to install Python themselves and do not see a terminal window pop up. The SenseVoice-Small model (about 230 MB) is downloaded automatically on first launch into ~/.shiyu/models/sensevoice-small/. For running the project from source, the Quick Start section lists three prerequisites: Node.js 18+, the stable Rust toolchain, and Python 3.10. You set up the backend in a virtual environment with pip install -r requirements.txt, then run the frontend with npm install followed by npm run tauri dev. The interface itself is described as a dark-mode glassmorphism design with synchronized waveform previews and timing navigation. The repo also ships a GitHub Actions workflow that builds installers for Windows (.exe), macOS (.dmg), and Linux (.deb) when you push a version tag like v1.1.0. The model files are not bundled in the installer, they are fetched at first launch. The project is MIT licensed.

Copy-paste prompts

Prompt 1

Walk me through the three prerequisites (Node 18+, Rust toolchain, Python 3.10) and the dev run command

Prompt 2

Explain how the Tauri front end talks to the Python backend over the local HTTP API

Prompt 3

Show me how to trigger the GitHub Actions workflow to build a .dmg installer from a version tag

Prompt 4

Sketch how I would swap SenseVoice-Small for a different ONNX speech model in the backend

Frequently asked questions

What is shiyu?

Shiyu Subtitle is a Tauri plus Vue 3 desktop app that transcribes audio and video to subtitles locally using the SenseVoice-Small model via ONNX Runtime.

What language is shiyu written in?

Mainly Python. The stack also includes Tauri, Rust, Vue.

What license does shiyu use?

MIT license, so you can use, modify, and ship it with almost no restrictions as long as you keep the copyright notice.

How hard is shiyu to set up?

Setup difficulty is rated moderate, with roughly 30min to a first successful run.

Who is shiyu for?

Mainly general.

Open on GitHub → Explain another repo

This repo across BitVibe Labs

Verify against the repo before relying on details.