Clone your voice and generate audiobook narrations or podcast intros without hiring voice actors.
Dictate notes, emails, and code comments hands-free using a global hotkey in any app.
Build AI agents in Claude or Cursor that speak responses aloud in custom cloned voices.
Create multi-speaker podcast episodes with different voices in a visual timeline editor.
Requires downloading large ML models (Whisper, Qwen3-TTS, Kokoro) and building Tauri desktop app with Rust dependencies.
Voicebox is a free, open-source desktop application that serves as a complete local AI voice studio, letting you clone voices, generate speech, and dictate into any app, all without sending data to the cloud. It positions itself as a combined local alternative to ElevenLabs (for voice output) and WisprFlow (for voice input). On the output side, you can clone any voice from a short audio sample and use it to convert text to speech in 23 languages, choosing from seven different text-to-speech engines, including Qwen3-TTS, Chatterbox, Kokoro, and HumeAI TADA, each with different strengths in quality, speed, and language coverage. You can add expressive tags like [laugh] or [sigh] to control delivery, apply audio effects like reverb or pitch shift, and even generate multi-speaker podcast-style conversations in a visual timeline editor. On the input side, a global keyboard hotkey activates voice dictation anywhere on your computer using Whisper-based speech recognition, automatically pasting the transcribed text into whatever field you are typing in. For AI power users, Voicebox exposes an API and a built-in MCP server (a standard for connecting AI tools), so agents running in tools like Claude Code or Cursor can call a single command to speak responses aloud in a cloned voice. All processing happens locally, nothing leaves your machine. It runs on macOS, Windows, Linux, and Docker, and is built with Tauri (a Rust-based framework for native desktop apps) with a TypeScript interface.
Generated 2026-05-18 · Model: sonnet-4-6 · Verify against the repo before relying on details.