Parler-TTS is a text-to-speech system that converts written text into spoken audio. What makes it different from many other text-to-speech tools is that you can describe the kind of voice you want using plain text. For example, you might specify a female speaker with a moderate pace and clear audio quality, and the system will generate speech that matches that description. You do not need to record a sample voice or select from a fixed list of presets. The project was built by Hugging Face and is based on research from Stability AI and Edinburgh University. It is fully open-source, meaning the training data, code, and pretrained model weights are all publicly available. This is notable in the text-to-speech field, where many capable systems are closed or proprietary. Two model sizes are available: a smaller 880 million parameter version called Parler-TTS Mini, and a larger 2.3 billion parameter version called Parler-TTS Large. Both were trained on around 45,000 hours of audiobook audio. The larger model produces higher-quality output at the cost of more compute. Using the library requires installing it via pip and writing a small amount of Python code. You provide the text you want spoken and a short description of the desired voice style, and the model generates a WAV audio file. For users who want consistent output from a specific voice, the model also includes 34 named speakers that can be referenced by name in the description prompt. The repository includes both inference code for generating speech and training code for people who want to fine-tune or train their own models. An interactive demo is hosted online for trying it without any setup.
← huggingface on gitmyhub — every repo by this author, as a profile.
Verify against the repo before relying on details.