explaingit

ggml-org/whisper.cpp

📈 Trending49,853C++Audience · developerComplexity · 3/5ActiveLicenseSetup · moderate

TLDR

C++ port of OpenAI's Whisper speech-to-text model that runs offline on any device, from desktops to Raspberry Pi, without Python or heavy dependencies.

Mindmap

mindmap
  root((whisper.cpp))
    What it does
      Speech to text
      Offline inference
      No cloud needed
    Supported devices
      Desktop GPUs
      Apple Silicon
      Raspberry Pi
      Mobile phones
    Tech stack
      C and C++
      CMake build
      Metal GPU
      CUDA support
    Use cases
      Transcribe audio
      Voice commands
      Generate subtitles
      Embed in apps

Things people build with this

USE CASE 1

Transcribe audio files to text on your computer or phone without uploading to a server.

USE CASE 2

Build voice command interfaces that respond to spoken input entirely offline.

USE CASE 3

Generate subtitles for videos using speech recognition on your own hardware.

USE CASE 4

Embed speech-to-text into a non-Python application or resource-constrained device.

Tech stack

CC++CMakeMetalCUDAWebAssemblyAVX

Getting it running

Difficulty · moderate Time to first run · 30min

CMake build required; CUDA/Metal optional but recommended for performance; no external services needed.

MIT license allows free use for any purpose, including commercial, as long as you include the original copyright notice.

In plain English

whisper.cpp is a C and C++ port of OpenAI's Whisper speech recognition model, which converts spoken audio into text. The original Whisper model was released by OpenAI as a Python implementation, which is convenient but requires Python, PyTorch, and significant dependencies to run. This project reimplements the same model inference from scratch in pure C and C++, making it possible to run speech-to-text conversion on almost any device without heavy software dependencies. The core innovation is that the same model can now run efficiently on devices ranging from a desktop GPU down to a Raspberry Pi, an iPhone, or an Android device, entirely offline without sending audio to a server. It achieves this through platform-specific optimizations: on Apple Silicon Macs and iPhones it uses Apple's Metal GPU acceleration and Core ML framework, on NVIDIA GPUs it uses CUDA, on x86 CPUs it uses AVX instructions, and it even supports WebAssembly for running in a browser. The models come in several sizes from tiny to large, trading off accuracy against memory usage and speed. You download a model file in the ggml format, build the project with CMake, and then pass it an audio file to get a transcript. You would use whisper.cpp when you need offline, on-device speech-to-text transcription without cloud services, when you want to embed Whisper into a non-Python application, or when you need to run it on a resource-constrained device. Common applications include transcribing recordings, building voice command interfaces, and generating subtitles. The tech stack is C and C++ with no mandatory external dependencies, built using CMake, with optional hardware-acceleration backends for Apple, NVIDIA, and Vulkan.

Copy-paste prompts

Prompt 1
How do I set up whisper.cpp to transcribe an audio file on my Mac with GPU acceleration?
Prompt 2
Show me how to use whisper.cpp in a C++ application to add speech-to-text without external dependencies.
Prompt 3
What's the smallest model I can use with whisper.cpp on a Raspberry Pi, and how do I optimize it for speed?
Prompt 4
How do I run whisper.cpp in a web browser using WebAssembly for client-side transcription?
Prompt 5
Can I use whisper.cpp to build a real-time voice command system that works offline?
Open on GitHub → Explain another repo

Generated 2026-05-18 · Model: sonnet-4-6 · Verify against the repo before relying on details.