explaingit

kittenml/kittentts

13,907PythonAudience · developerComplexity · 2/5LicenseSetup · easy

TLDR

Kitten TTS is a lightweight Python text-to-speech library that runs entirely on CPU with no GPU required, using models as small as 25 MB and offering 8 built-in voices.

Mindmap

mindmap
  root((Kitten TTS))
    What it does
      Text to speech
      CPU only no GPU
      Small model sizes
    Features
      8 built-in voices
      Speed adjustment
      Text normalization
    Model Sizes
      Nano 15M params
      Mini 80M params
      Download from Hugging Face
    Usage
      pip install
      Load model and voice
      Save as wav file
Click or tap to explore — scroll the page freely

Code map

Detail Auto

An interactive map of this repo's files and how they connect — its source is parsed live in your browser. Click Visualize to build it.

filefunction / class

Things people build with this

USE CASE 1

Add spoken audio output to a Python app or script on any machine without needing a GPU or a cloud API.

USE CASE 2

Generate speech audio files from text on a low-powered device like a Raspberry Pi or small server.

USE CASE 3

Convert mixed text containing abbreviations, prices, and times into natural speech using the built-in text normalizer.

USE CASE 4

Pick from 8 built-in voices and adjust speaking speed to narrate content inside your Python application.

Tech stack

PythonpipHugging Face

Getting it running

Difficulty · easy Time to first run · 5min

Requires Python 3.8+ and pip, models download automatically from Hugging Face on first run, no GPU or special hardware needed.

Use freely for any purpose including commercial use, you must include the Apache license notice and note any changes you make to the code.

In plain English

Kitten TTS is an open-source text-to-speech tool, meaning it turns written text into spoken audio. Its main selling point, according to the README, is that it is very small and undemanding. The models that do the work range from 25 to 80 megabytes on disk, and they run on an ordinary computer processor without needing a separate graphics card, which is the expensive hardware many speech and AI tools usually require. That makes it suitable for running on small or low-powered devices. The README labels it a developer preview, so the way you call it may change between versions. The project offers several model sizes, from a 15-million-parameter nano version up to an 80-million-parameter mini version, each downloadable from the Hugging Face model-sharing site. It ships with eight built-in voices named Bella, Jasper, Luna, Bruno, Rosie, Hugo, Kiki, and Leo, and it produces audio at a standard 24 kHz quality. You can adjust how fast the voice speaks. Kitten TTS is used from the Python programming language. After installing it with pip, you load a model, hand it a sentence and a voice name, and get back the audio, which you can then save as a wav sound file. The README shows short code examples for the basic case, for changing the speed, for saving straight to a file, and for listing the available voices. There is also an option to run on a graphics card if you have one, for more speed. A useful built-in feature is text preprocessing, which cleans up input before it is spoken. A normalize_text function turns things like "Dr. Rivera paid $12.50 at 3:05 p.m." into the fully spelled-out words a voice should actually say. The README also lists system requirements (it works on Linux, macOS, and Windows with Python 3.8 or later), a roadmap of planned features such as mobile support and multilingual voices, and contact details for paid commercial support, custom voices, and enterprise licensing. The project is released under the Apache License 2.0.

Copy-paste prompts

Prompt 1
Using kittentts, write Python code to convert a paragraph of text into a .wav file using the Bella voice at 1.2x speed.
Prompt 2
Show me how to use kittentts to correctly speak a string like 'Dr. Rivera paid $12.50 at 3:05 p.m.' by running it through normalize_text first.
Prompt 3
Help me add kittentts to a Python command-line app that reads a text file aloud in the user's chosen voice from the 8 available options.
Prompt 4
Write a Python script that uses kittentts to generate a separate .wav audio file for each line in a text file.
Open on GitHub → Explain another repo

← kittenml on gitmyhub — every repo by this author, as a profile.

Verify against the repo before relying on details.