Analysis updated 2026-06-21
Clone a voice from a short audio sample and use it to generate narration for a video, podcast, or audiobook.
Press a global hotkey anywhere on your computer to dictate text by voice and have it automatically typed into any app.
Build a multi-speaker podcast conversation from a script using different cloned voices in a visual timeline editor.
Connect Voicebox's MCP server to Claude Code or Cursor so your AI agent can speak responses aloud in a cloned voice.
| jamiepine/voicebox | maotoumao/musicfree | vuejs/devtools-v6 | |
|---|---|---|---|
| Stars | 24,674 | 24,609 | 24,742 |
| Language | TypeScript | TypeScript | TypeScript |
| Setup difficulty | easy | moderate | easy |
| Complexity | 2/5 | 2/5 | 3/5 |
| Audience | vibe coder | general | developer |
Figures from each repo's GitHub metadata at analysis time.
Downloads AI model weights on first run, GPU recommended for faster generation but a CPU fallback is available.
Voicebox is a free, open-source desktop application that serves as a complete local AI voice studio, letting you clone voices, generate speech, and dictate into any app, all without sending data to the cloud. It positions itself as a combined local alternative to ElevenLabs (for voice output) and WisprFlow (for voice input). On the output side, you can clone any voice from a short audio sample and use it to convert text to speech in 23 languages, choosing from seven different text-to-speech engines, including Qwen3-TTS, Chatterbox, Kokoro, and HumeAI TADA, each with different strengths in quality, speed, and language coverage. You can add expressive tags like [laugh] or [sigh] to control delivery, apply audio effects like reverb or pitch shift, and even generate multi-speaker podcast-style conversations in a visual timeline editor. On the input side, a global keyboard hotkey activates voice dictation anywhere on your computer using Whisper-based speech recognition, automatically pasting the transcribed text into whatever field you are typing in. For AI power users, Voicebox exposes an API and a built-in MCP server (a standard for connecting AI tools), so agents running in tools like Claude Code or Cursor can call a single command to speak responses aloud in a cloned voice. All processing happens locally, nothing leaves your machine. It runs on macOS, Windows, Linux, and Docker, and is built with Tauri (a Rust-based framework for native desktop apps) with a TypeScript interface.
Voicebox is a free, local AI voice studio that clones voices, generates speech in 23 languages, and lets you dictate into any app, all running on your own machine with no data sent to the cloud.
Mainly TypeScript. The stack also includes TypeScript, Tauri, Rust.
Open-source, use freely, check the repository for specific license terms.
Setup difficulty is rated easy, with roughly 5min to a first successful run.
Mainly vibe coder.
This repo across BitVibe Labs
Verify against the repo before relying on details.