openbmb/voxcpm

Analysis updated 2026-06-21

★ 18,697PythonAudience · developerComplexity · 4/5LicenseSetup · hard

Mindmap

mindmap
  root((VoxCPM))
    What it does
      Text to speech
      Voice cloning
      Voice design
    Tech
      Python
      Diffusion models
      Hugging Face weights
    Capabilities
      30 languages
      48kHz audio output
      Real-time streaming
    Audience
      AI developers
      App builders
      Content creators

mindmap root((VoxCPM)) What it does Text to speech Voice cloning Voice design Tech Python Diffusion models Hugging Face weights Capabilities 30 languages 48kHz audio output Real-time streaming Audience AI developers App builders Content creators

Click or tap to explore — scroll the page freely

What do people build with it?

USE CASE 1

Convert a written article into spoken audio in 30 languages without any voice recording setup.

USE CASE 2

Clone a speaker's voice from a short audio sample to narrate new content in their style.

USE CASE 3

Build a multilingual voice assistant or podcast tool using a single open-source model.

USE CASE 4

Design a custom synthetic voice by describing its characteristics in text, no reference audio needed.

What is it built with?

PythonPyTorchHugging Face

How does it compare?

	openbmb/voxcpm	karpathy/llm-council	pyscript/pyscript
Stars	18,697	18,703	18,685
Language	Python	Python	Python
Setup difficulty	hard	moderate	easy
Complexity	4/5	2/5	3/5
Audience	developer	vibe coder	developer

Figures from each repo's GitHub metadata at analysis time.

How do you get it running?

Difficulty · hard Time to first run · 1h+

Requires a GPU for practical use, the 2-billion parameter model is large and slow on CPU.

Apache 2.0, use freely for any purpose including commercial, as long as you keep the license and copyright notice.

In plain English

VoxCPM is a text-to-speech system, software that converts written text into spoken audio. Its main technical distinction is that it skips the usual step of breaking speech into discrete sound tokens, instead generating speech directly as continuous audio representations through an architecture that combines diffusion models with autoregressive generation. The project claims this approach produces more natural and expressive speech than tokenization-based systems. The current version, VoxCPM2, is a 2-billion parameter model trained on over 2 million hours of multilingual audio data across 30 languages. Beyond standard text-to-speech, it supports three additional capabilities: Voice Design (describing a voice in plain text and having the model create it without any reference recording), Controllable Voice Cloning (copying someone's voice from a short audio clip while optionally adjusting the style), and Ultimate Cloning (reproducing every detail of a voice by providing both the reference audio and its transcript). Output is 48kHz audio. Installation is via pip, and the model weights are available on Hugging Face. A Python API, command-line interface, and web demo are all provided. The model can run in real-time streaming mode and is released under the Apache 2.0 license, permitting commercial use.

Copy-paste prompts

Prompt 1

Using VoxCPM2, write Python code to convert a paragraph of English text into a 48kHz audio file using the default voice, include the pip install command and the full script.

Prompt 2

I have a 10-second audio clip of a speaker. Show me how to use VoxCPM's Controllable Voice Cloning feature to clone that voice and use it to narrate new text in Python.

Prompt 3

I want to use VoxCPM's Voice Design feature to create a 'calm female narrator with a British accent' without providing any reference audio. Write the Python code to do this.

Prompt 4

Write a Python script that uses VoxCPM2 in real-time streaming mode to play audio as it generates from a long piece of text, rather than waiting for the full file.

Frequently asked questions

What is voxcpm?

VoxCPM is an open-source text-to-speech system that generates natural-sounding speech in 30 languages and can clone voices from short audio clips or create new voices from text descriptions.

What language is voxcpm written in?

Mainly Python. The stack also includes Python, PyTorch, Hugging Face.

What license does voxcpm use?

Apache 2.0, use freely for any purpose including commercial, as long as you keep the license and copyright notice.

How hard is voxcpm to set up?

Setup difficulty is rated hard, with roughly 1h+ to a first successful run.

Who is voxcpm for?

Mainly developer.

Open on GitHub → Explain another repo

This repo across BitVibe Labs

Scan in gitsafehub Deploy in gitdeployhub openbmb on gitmyhub

Verify against the repo before relying on details.