explaingit

2noise/chattts

39,281PythonAudience · developerComplexity · 2/5MaintainedLicenseSetup · moderate

TLDR

A text-to-speech model that generates natural, conversational speech with realistic pauses, emphasis, and emotion. Supports English and Chinese, trained on 100,000+ hours of audio.

Mindmap

mindmap
  root((ChatTTS))
    What it does
      Converts text to speech
      Natural conversational tone
      Multiple speakers
      Controllable prosody
    Key features
      Laughter and pauses
      Emotional variation
      Speaker control
      Dialogue simulation
    Use cases
      AI assistant voices
      Podcast generation
      Voice interfaces
      Training data
    Tech stack
      Python
      PyTorch
      Hugging Face
    Audience
      AI developers
      Content creators
      Researchers

Things people build with this

USE CASE 1

Give a natural voice to conversational AI chatbots and virtual assistants.

USE CASE 2

Generate podcast-style audio content automatically from written scripts or articles.

USE CASE 3

Build interactive voice interfaces where multiple characters speak with distinct voices.

USE CASE 4

Create training datasets for other speech models using diverse, natural-sounding dialogue.

Tech stack

PythonPyTorchHugging Face

Getting it running

Difficulty · moderate Time to first run · 30min

Requires downloading large pre-trained model weights from Hugging Face; PyTorch installation may need CUDA setup depending on hardware.

Code is licensed under AGPLv3 (copyleft); model weights are Creative Commons non-commercial, intended for research and educational use only.

In plain English

ChatTTS is a generative text-to-speech model specifically designed to produce natural-sounding speech for dialogue and conversational contexts. Unlike traditional text-to-speech systems that produce robotic, uniform-sounding output, ChatTTS is trained to generate speech that sounds like a real person talking, with natural pauses, emphasis, laughter, and the kind of prosodic variation that makes conversation feel human. It supports both English and Chinese, and was trained on over 100,000 hours of audio data. The model works by taking text as input and generating audio waveforms directly. It uses a generative architecture similar to how large language models generate text, but instead produces speech tokens that are decoded into audio. You can control aspects of the output such as the speaker's voice, speaking speed, the placement of laughter and pauses, and the emotional tone. Multiple speakers can be generated in a single inference call, making it suitable for scenarios where you want to simulate a dialogue between distinct voices. You would use ChatTTS when building applications that need natural conversational speech output, for example, giving a voice to a conversational AI assistant, creating podcast-style audio content from text, building interactive voice interfaces, or generating training data for other speech models. It is particularly well-suited for use with LLM-based chatbots where the spoken output needs to feel genuinely conversational rather than robotic. ChatTTS is written in Python and uses PyTorch as its deep learning framework. The pretrained model weights are hosted on Hugging Face and can be loaded with a few lines of Python. The code is licensed under AGPLv3, while the model weights are released under a Creative Commons non-commercial license, meaning the model is intended for research and educational use rather than commercial deployment.

Copy-paste prompts

Prompt 1
How do I use ChatTTS to generate speech from text in Python? Show me a simple example with different speakers.
Prompt 2
I want to create a podcast from a blog post using ChatTTS. What's the workflow to convert text to audio with natural pauses and emphasis?
Prompt 3
How can I control the emotional tone, speaking speed, and laughter in ChatTTS output?
Prompt 4
Can I use ChatTTS to generate dialogue between two characters with different voices in a single call?
Prompt 5
What are the limitations of ChatTTS for commercial use, and what license restrictions apply?
Open on GitHub → Explain another repo

Generated 2026-05-18 · Model: sonnet-4-6 · Verify against the repo before relying on details.