Build voice agents that speak naturally with emotional expression and laughter.
Create audiobooks in multiple languages without hiring voice actors.
Localize apps and games into 23+ languages with realistic AI voices.
Generate character voices for interactive media by cloning a short audio sample.
PyTorch installation and model downloading can take time depending on internet speed and GPU availability.
Chatterbox is a family of open-source text-to-speech (TTS) models, software that converts written text into realistic spoken audio. It is built by Resemble AI and represents their state-of-the-art open-source offering. The library includes three models. Chatterbox-Turbo is the fastest and most efficient, built on a 350 million parameter neural network. It supports paralinguistic tags, special markers in the text like [laugh] or [cough] that make the generated speech sound more natural and human. Chatterbox handles English and supports creative controls like adjusting the expressiveness of the voice. Chatterbox-Multilingual supports over 23 languages including French, Chinese, Japanese, and Arabic. All three models support zero-shot voice cloning, meaning you can provide a short audio clip of a real person speaking and the model will generate new speech that sounds like that person, without any special training required. You would use Chatterbox when you need AI-generated voices for voice agents, audiobooks, localization, interactive media, or any application that turns text into speech. The watermarking system baked in adds invisible neural markers to all generated audio, helping identify AI-generated content. The tech stack is Python, using PyTorch for the underlying neural network computations.
Generated 2026-05-18 · Model: sonnet-4-6 · Verify against the repo before relying on details.