Add offline speech-to-text transcription to a mobile iOS or Android app without sending any audio to a remote server.
Run on-device text-to-speech on a Raspberry Pi or embedded chip for a voice assistant that works without internet.
Separate speakers in a recorded meeting using the diarization feature via the Python API on your laptop.
Add voice activity detection to a browser app using the WebAssembly build of Sherpa-ONNX.
Must select the correct pre-built binary for your hardware platform and download the appropriate ONNX model files separately before running.
Sherpa-ONNX is a toolkit for running speech-related AI tasks entirely on-device, without sending audio to any server or requiring an internet connection. It is built on top of ONNX Runtime, a widely used engine for running AI models across different hardware, and draws on techniques from the Kaldi speech recognition project. The toolkit covers a broad set of audio processing tasks: converting spoken audio to text (transcription), converting text to spoken audio, separating a recording into individual speakers (diarization), identifying which speaker is talking, detecting what language is being spoken, tagging audio with sound categories, detecting when speech is present versus silence (voice activity detection), cleaning up noisy audio (enhancement), and separating mixed audio sources such as vocals from instruments. One of the more distinctive aspects of this project is how many platforms and programming languages it supports. It runs on standard desktop and server hardware (x86 and ARM), on mobile operating systems (Android, iOS, HarmonyOS), on small single-board computers like Raspberry Pi, and on specialized embedded chips including various neural processing units from Rockchip, Qualcomm, Axera, and Ascend. For code integration, it provides APIs for 12 languages: C++, C, Python, JavaScript, Java, C#, Kotlin, Swift, Go, Dart, Rust, and Pascal. WebAssembly support means it can also run inside a web browser. The repository links to a set of online demos hosted on Hugging Face where anyone can try the speech recognition, text-to-speech, speaker diarization, audio tagging, and source separation features directly in a browser without installing anything. Mirror versions of those demos are also hosted on ModelScope for users in China. Sherpa-ONNX is positioned as a practical deployment tool rather than a research framework. Its broad hardware and language support makes it aimed at developers who need to ship working speech functionality in real applications across diverse devices and environments. The full README is longer than what was shown.
← k2-fsa on gitmyhub — every repo by this author, as a profile.
Verify against the repo before relying on details.