Pick a CPU-friendly TTS model by comparing speed and audio quality
Reproduce a 120-run TTS benchmark on a CPU with espeak-ng
Generate a Markdown report and matplotlib charts from TTS timing data
Listen to side-by-side WAV samples of Kokoro and Supertonic across text lengths
Needs espeak-ng installed at the OS level plus a Python venv with PyTorch and ONNX Runtime, and you must download Kokoro ONNX model files from Hugging Face before running.
This repository is a head-to-head benchmark of two text-to-speech models, Kokoro 82M and Supertonic 3, both running on a regular CPU with no GPU. Text to speech, or TTS, is software that turns written text into spoken audio. The point of the comparison is to see which model gives better trade-offs between how fast it generates audio and how natural that audio sounds. The README states up front that the benchmark itself was designed, written, and executed end to end by an autonomous coding agent called Neo from a single prompt, with no manual coding or configuration. The benchmark was run on an AMD EPYC 7763 with 4 cores and 15.6GB of RAM using Python 3.11. The results are summarized in a small table. Supertonic-3 in 2-step mode is the fastest at about 6.1 times real-time speed, but the audio quality is described as poor and robotic. Supertonic-3 in 5-step mode runs at 3.2 times real-time with audio quality described as good and clear. Kokoro 82M in both its PyTorch and ONNX forms runs at about 2 times real-time but has excellent, human-like quality. The author calls Supertonic 2-step the speed winner, 5-step the balance pick, and Kokoro the quality winner. The repo contains a benchmark.py script that runs 120 timed measurements, a report.py script that turns the raw numbers into a Markdown report and matplotlib charts, and a results folder with the CSV of raw timings, the rendered report, two charts comparing real-time factor and latency against text length, and 24 generated WAV audio samples covering each configuration and text length combination. A separate blog_post.md writes up the findings in more depth. To reproduce the benchmark yourself, the README walks through installing the espeak-ng system dependency, creating a Python virtual environment, installing the supertonic, kokoro, kokoro-onnx, and onnxruntime packages along with soundfile, matplotlib, pandas, numpy, and torch, then downloading the Kokoro ONNX model files from Hugging Face before running benchmark.py and report.py. The repository has 3 stars at the time of writing.
Generated 2026-05-22 · Model: sonnet-4-6 · Verify against the repo before relying on details.