explaingit

k2-fsa/sherpa-onnx

12,211C++Audience · developerComplexity · 4/5Setup · hard

TLDR

Sherpa-ONNX is a toolkit for running speech recognition, text-to-speech, speaker diarization, and other audio AI tasks entirely on-device across mobile, desktop, embedded hardware, and browsers, no internet connection required.

Mindmap

mindmap
  root((sherpa-onnx))
    Audio Tasks
      Speech to text
      Text to speech
      Speaker diarization
      Voice activity detection
    Platforms
      Desktop and server
      Android iOS
      Raspberry Pi
      Browser via WASM
    APIs
      Python C and C++
      JavaScript Kotlin Swift
      Go Dart Rust
    Key Feature
      Fully on-device
      No internet needed
Click or tap to explore — scroll the page freely

Code map

Detail Auto

An interactive map of this repo's files and how they connect — its source is parsed live in your browser. Click Visualize to build it.

filefunction / class

Things people build with this

USE CASE 1

Add offline speech-to-text transcription to a mobile iOS or Android app without sending any audio to a remote server.

USE CASE 2

Run on-device text-to-speech on a Raspberry Pi or embedded chip for a voice assistant that works without internet.

USE CASE 3

Separate speakers in a recorded meeting using the diarization feature via the Python API on your laptop.

USE CASE 4

Add voice activity detection to a browser app using the WebAssembly build of Sherpa-ONNX.

Tech stack

C++CPythonJavaScriptONNX Runtime

Getting it running

Difficulty · hard Time to first run · 1h+

Must select the correct pre-built binary for your hardware platform and download the appropriate ONNX model files separately before running.

In plain English

Sherpa-ONNX is a toolkit for running speech-related AI tasks entirely on-device, without sending audio to any server or requiring an internet connection. It is built on top of ONNX Runtime, a widely used engine for running AI models across different hardware, and draws on techniques from the Kaldi speech recognition project. The toolkit covers a broad set of audio processing tasks: converting spoken audio to text (transcription), converting text to spoken audio, separating a recording into individual speakers (diarization), identifying which speaker is talking, detecting what language is being spoken, tagging audio with sound categories, detecting when speech is present versus silence (voice activity detection), cleaning up noisy audio (enhancement), and separating mixed audio sources such as vocals from instruments. One of the more distinctive aspects of this project is how many platforms and programming languages it supports. It runs on standard desktop and server hardware (x86 and ARM), on mobile operating systems (Android, iOS, HarmonyOS), on small single-board computers like Raspberry Pi, and on specialized embedded chips including various neural processing units from Rockchip, Qualcomm, Axera, and Ascend. For code integration, it provides APIs for 12 languages: C++, C, Python, JavaScript, Java, C#, Kotlin, Swift, Go, Dart, Rust, and Pascal. WebAssembly support means it can also run inside a web browser. The repository links to a set of online demos hosted on Hugging Face where anyone can try the speech recognition, text-to-speech, speaker diarization, audio tagging, and source separation features directly in a browser without installing anything. Mirror versions of those demos are also hosted on ModelScope for users in China. Sherpa-ONNX is positioned as a practical deployment tool rather than a research framework. Its broad hardware and language support makes it aimed at developers who need to ship working speech functionality in real applications across diverse devices and environments. The full README is longer than what was shown.

Copy-paste prompts

Prompt 1
Show me how to use the sherpa-onnx Python API to transcribe a .wav audio file to text using a pre-trained offline model.
Prompt 2
How do I integrate sherpa-onnx into an Android app using the Kotlin API to add real-time offline speech recognition?
Prompt 3
Help me set up sherpa-onnx speaker diarization in Python to split a 30-minute meeting recording into per-speaker segments.
Prompt 4
Show me how to use the sherpa-onnx WebAssembly build to detect when a user is speaking versus silent in a browser application.
Prompt 5
How do I run sherpa-onnx text-to-speech on a Raspberry Pi and stream the generated audio to a speaker?
Open on GitHub → Explain another repo

← k2-fsa on gitmyhub — every repo by this author, as a profile.

Verify against the repo before relying on details.