Convert a public-domain text book into a listenable MP3 audiobook without any paid text-to-speech service.
Run a long conversion overnight and resume it from where it left off if the process is interrupted.
Use GPU acceleration with CUDA to generate hours of audiobook audio roughly eight times faster than real time.
Requires ffmpeg and espeak-ng installed via system package manager, plus PyTorch with CUDA for GPU acceleration, CPU mode is much slower.
kokobook is a tool that converts a plain text file into an MP3 audiobook using an AI text-to-speech system called Kokoro. You give it a text file containing a book, run a shell script, and it produces a single audiobook.mp3 file. While the conversion is running, a small web page at a local address lets you pause, resume, or stop the process. The conversion works by splitting the book into short chunks of one or two sentences, generating audio for each chunk separately, and then joining all the chunks into one file using a tool called ffmpeg. A key feature is that this process is resumable: a work file tracks which chunks have been completed, so if the process is interrupted for any reason, restarting it picks up from where it left off without repeating any audio that was already generated. Kokoro is a model with 82 million parameters that produces natural-sounding speech. On a machine with an NVIDIA graphics card it can synthesize audio roughly eight times faster than real time. It also runs on a standard CPU if no GPU is available, though more slowly. The tool handles configurable silence between sentences to make the listening experience feel more natural. Setup requires Python 3.10 or later, ffmpeg (for joining the audio files), and espeak-ng (a speech processing library that Kokoro depends on). Both system tools are available through the standard package manager on Linux. An optional PyTorch installation with CUDA support is needed for GPU acceleration, with specific version requirements for older graphics card generations. The text cleaning step is tuned for one particular ebook export format, so users converting books from different sources may need to adjust the cleaning logic. The README also notes that users should not distribute audio made from copyrighted books.
← arpecop on gitmyhub — every repo by this author, as a profile.
Verify against the repo before relying on details.