Add spoken audio output to a Python app or script on any machine without needing a GPU or a cloud API.
Generate speech audio files from text on a low-powered device like a Raspberry Pi or small server.
Convert mixed text containing abbreviations, prices, and times into natural speech using the built-in text normalizer.
Pick from 8 built-in voices and adjust speaking speed to narrate content inside your Python application.
Requires Python 3.8+ and pip, models download automatically from Hugging Face on first run, no GPU or special hardware needed.
Kitten TTS is an open-source text-to-speech tool, meaning it turns written text into spoken audio. Its main selling point, according to the README, is that it is very small and undemanding. The models that do the work range from 25 to 80 megabytes on disk, and they run on an ordinary computer processor without needing a separate graphics card, which is the expensive hardware many speech and AI tools usually require. That makes it suitable for running on small or low-powered devices. The README labels it a developer preview, so the way you call it may change between versions. The project offers several model sizes, from a 15-million-parameter nano version up to an 80-million-parameter mini version, each downloadable from the Hugging Face model-sharing site. It ships with eight built-in voices named Bella, Jasper, Luna, Bruno, Rosie, Hugo, Kiki, and Leo, and it produces audio at a standard 24 kHz quality. You can adjust how fast the voice speaks. Kitten TTS is used from the Python programming language. After installing it with pip, you load a model, hand it a sentence and a voice name, and get back the audio, which you can then save as a wav sound file. The README shows short code examples for the basic case, for changing the speed, for saving straight to a file, and for listing the available voices. There is also an option to run on a graphics card if you have one, for more speed. A useful built-in feature is text preprocessing, which cleans up input before it is spoken. A normalize_text function turns things like "Dr. Rivera paid $12.50 at 3:05 p.m." into the fully spelled-out words a voice should actually say. The README also lists system requirements (it works on Linux, macOS, and Windows with Python 3.8 or later), a roadmap of planned features such as mobile support and multilingual voices, and contact details for paid commercial support, custom voices, and enterprise licensing. The project is released under the Apache License 2.0.
← kittenml on gitmyhub — every repo by this author, as a profile.
Verify against the repo before relying on details.