jamiepine/voicebox

Analysis updated 2026-06-21

★ 24,674TypeScriptAudience · vibe coderComplexity · 2/5LicenseSetup · easy

Mindmap

mindmap
  root((voicebox))
    Voice output
      Clone from sample
      23 languages
      Expressive tags
    Voice input
      Global hotkey
      Whisper recognition
      Auto-paste text
    AI integration
      MCP server
      API access
      Agent-ready
    Privacy
      Fully local
      No cloud upload
    Audience
      AI power users
      Content creators

mindmap root((voicebox)) Voice output Clone from sample 23 languages Expressive tags Voice input Global hotkey Whisper recognition Auto-paste text AI integration MCP server API access Agent-ready Privacy Fully local No cloud upload Audience AI power users Content creators

Click or tap to explore — scroll the page freely

What do people build with it?

USE CASE 1

Clone a voice from a short audio sample and use it to generate narration for a video, podcast, or audiobook.

USE CASE 2

Press a global hotkey anywhere on your computer to dictate text by voice and have it automatically typed into any app.

USE CASE 3

Build a multi-speaker podcast conversation from a script using different cloned voices in a visual timeline editor.

USE CASE 4

Connect Voicebox's MCP server to Claude Code or Cursor so your AI agent can speak responses aloud in a cloned voice.

What is it built with?

TypeScriptTauriRustWhisper

How does it compare?

	jamiepine/voicebox	maotoumao/musicfree	vuejs/devtools-v6
Stars	24,674	24,609	24,742
Language	TypeScript	TypeScript	TypeScript
Setup difficulty	easy	moderate	easy
Complexity	2/5	2/5	3/5
Audience	vibe coder	general	developer

Figures from each repo's GitHub metadata at analysis time.

How do you get it running?

Difficulty · easy Time to first run · 5min

Downloads AI model weights on first run, GPU recommended for faster generation but a CPU fallback is available.

Open-source, use freely, check the repository for specific license terms.

In plain English

Voicebox is a free, open-source desktop application that serves as a complete local AI voice studio, letting you clone voices, generate speech, and dictate into any app, all without sending data to the cloud. It positions itself as a combined local alternative to ElevenLabs (for voice output) and WisprFlow (for voice input). On the output side, you can clone any voice from a short audio sample and use it to convert text to speech in 23 languages, choosing from seven different text-to-speech engines, including Qwen3-TTS, Chatterbox, Kokoro, and HumeAI TADA, each with different strengths in quality, speed, and language coverage. You can add expressive tags like [laugh] or [sigh] to control delivery, apply audio effects like reverb or pitch shift, and even generate multi-speaker podcast-style conversations in a visual timeline editor. On the input side, a global keyboard hotkey activates voice dictation anywhere on your computer using Whisper-based speech recognition, automatically pasting the transcribed text into whatever field you are typing in. For AI power users, Voicebox exposes an API and a built-in MCP server (a standard for connecting AI tools), so agents running in tools like Claude Code or Cursor can call a single command to speak responses aloud in a cloned voice. All processing happens locally, nothing leaves your machine. It runs on macOS, Windows, Linux, and Docker, and is built with Tauri (a Rust-based framework for native desktop apps) with a TypeScript interface.

Copy-paste prompts

Prompt 1

How do I clone a voice in Voicebox from a 30-second audio clip and use it to generate speech narration for a YouTube video?

Prompt 2

Set up Voicebox's global hotkey dictation on my Mac so I can speak and have text automatically pasted into any app including my code editor.

Prompt 3

How do I connect Voicebox's built-in MCP server to Claude Code so my AI agent can call a tool that speaks output aloud?

Prompt 4

Create a multi-speaker podcast dialogue in Voicebox with two different cloned voices, and export it as a single audio file.

Prompt 5

Show me how to use expressive tags like [laugh] and [sigh] in Voicebox text-to-speech to make generated speech sound more natural.

Frequently asked questions

What is voicebox?

Voicebox is a free, local AI voice studio that clones voices, generates speech in 23 languages, and lets you dictate into any app, all running on your own machine with no data sent to the cloud.

What language is voicebox written in?

Mainly TypeScript. The stack also includes TypeScript, Tauri, Rust.

What license does voicebox use?

Open-source, use freely, check the repository for specific license terms.

How hard is voicebox to set up?

Setup difficulty is rated easy, with roughly 5min to a first successful run.

Who is voicebox for?

Mainly vibe coder.

Open on GitHub → Explain another repo

This repo across BitVibe Labs

Scan in gitsafehub Deploy in gitdeployhub jamiepine on gitmyhub

Verify against the repo before relying on details.