explaingit

neuphonic/neutts

5,847PythonAudience · developerComplexity · 3/5Setup · moderate

TLDR

NeuTTS is a local text-to-speech AI that converts text to realistic spoken audio entirely on your device, with voice cloning from a short audio clip, supporting English, French, German, and Spanish.

Mindmap

mindmap
  root((NeuTTS))
    Models
      NeuTTS-Air 360M English
      NeuTTS-Nano 120M multilingual
    Languages
      English
      French German Spanish
    Features
      Voice cloning
      Watermarked audio
      Runs faster than real time
    Setup
      pip install
      GGUF format
      llama-cpp-python backend
Click or tap to explore — scroll the page freely

Code map

Detail Auto

An interactive map of this repo's files and how they connect — its source is parsed live in your browser. Click Visualize to build it.

filefunction / class

Things people build with this

USE CASE 1

Add offline text-to-speech to a Python app without sending audio data to a cloud API.

USE CASE 2

Clone a voice from a few seconds of audio and generate new speech that sounds like that specific person.

USE CASE 3

Build a local voice assistant or screen reader that works without an internet connection.

Tech stack

PythonGGUFllama-cpp-python

Getting it running

Difficulty · moderate Time to first run · 30min

Requires installing llama-cpp-python with platform-specific hardware acceleration flags (Metal on Mac, CUDA on NVIDIA).

NeuTTS-Air is Apache 2.0 (permissive), NeuTTS-Nano uses a custom NeuTTS Open License 1.0 with separate terms.

In plain English

NeuTTS is a collection of open-source text-to-speech models that convert written text into spoken audio. The key design goal is that the models run entirely on a local device, without sending data to a cloud service. The project is made by Neuphonic, a company focused on on-device voice AI. There are two model families. NeuTTS-Air is the larger one, with around 360 million active parameters, and supports English. NeuTTS-Nano is smaller at around 120 million active parameters and comes in separate versions for English, French, German, and Spanish. Both families support voice cloning: given just a few seconds of a person's audio, the model can generate new speech that sounds like that person. All generated audio is watermarked. The models are distributed in GGUF format, which is a file format commonly used for running AI models on ordinary computers without specialized hardware. Quantized versions are available at different quality levels to trade off file size against audio quality. The README includes benchmark results showing generation speeds on a Samsung Galaxy phone, a laptop CPU, an Apple M4 chip, and an NVIDIA graphics card. On a mid-range laptop the smaller model produces audio faster than real time. The Python package is installed via pip and can work with the llama-cpp-python inference backend for GGUF models. The README covers installation steps for different operating systems including macOS, Linux, and Windows, with notes on enabling hardware acceleration on each platform. Fine-tuning scripts are also included in the repository for those who want to adapt the models to custom voices or use cases. NeuTTS-Air is licensed under Apache 2.0. The NeuTTS-Nano models use a separate license called the NeuTTS Open License 1.0.

Copy-paste prompts

Prompt 1
Using the NeuTTS Python package, write a script that reads a text file and saves the spoken output as a WAV file using the NeuTTS-Nano English model.
Prompt 2
How do I clone a voice with NeuTTS by providing a short audio sample, and then generate a new sentence in that cloned voice?
Prompt 3
Walk me through setting up NeuTTS on an Apple M4 Mac with Metal hardware acceleration enabled and generating a French speech sample.
Prompt 4
I need fast English TTS on a laptop CPU with no GPU. Compare NeuTTS-Air and NeuTTS-Nano for my use case and recommend which model to use.
Open on GitHub → Explain another repo

← neuphonic on gitmyhub — every repo by this author, as a profile.

Verify against the repo before relying on details.