arpecop/kokobook

★ 12PythonAudience · generalComplexity · 3/5Setup · hard

Mindmap

mindmap
  root((kokobook))
    What it does
      Text to MP3 audiobook
      Chunk and rejoin audio
      Resume on interrupt
      Browser control panel
    Tech stack
      Python
      Kokoro TTS model
      PyTorch CUDA
      ffmpeg
      espeak-ng
    Use cases
      Public domain books
      Overnight conversions
      GPU-accelerated audio
    Setup
      Python 3.10 plus
      Linux package deps
      Optional CUDA GPU

mindmap root((kokobook)) What it does Text to MP3 audiobook Chunk and rejoin audio Resume on interrupt Browser control panel Tech stack Python Kokoro TTS model PyTorch CUDA ffmpeg espeak-ng Use cases Public domain books Overnight conversions GPU-accelerated audio Setup Python 3.10 plus Linux package deps Optional CUDA GPU

Click or tap to explore — scroll the page freely

Things people build with this

USE CASE 1

Convert a public-domain text book into a listenable MP3 audiobook without any paid text-to-speech service.

USE CASE 2

Run a long conversion overnight and resume it from where it left off if the process is interrupted.

USE CASE 3

Use GPU acceleration with CUDA to generate hours of audiobook audio roughly eight times faster than real time.

Tech stack

PythonKokoroPyTorchCUDAffmpegespeak-ng

Getting it running

Difficulty · hard Time to first run · 1h+

Requires ffmpeg and espeak-ng installed via system package manager, plus PyTorch with CUDA for GPU acceleration, CPU mode is much slower.

In plain English

kokobook is a tool that converts a plain text file into an MP3 audiobook using an AI text-to-speech system called Kokoro. You give it a text file containing a book, run a shell script, and it produces a single audiobook.mp3 file. While the conversion is running, a small web page at a local address lets you pause, resume, or stop the process. The conversion works by splitting the book into short chunks of one or two sentences, generating audio for each chunk separately, and then joining all the chunks into one file using a tool called ffmpeg. A key feature is that this process is resumable: a work file tracks which chunks have been completed, so if the process is interrupted for any reason, restarting it picks up from where it left off without repeating any audio that was already generated. Kokoro is a model with 82 million parameters that produces natural-sounding speech. On a machine with an NVIDIA graphics card it can synthesize audio roughly eight times faster than real time. It also runs on a standard CPU if no GPU is available, though more slowly. The tool handles configurable silence between sentences to make the listening experience feel more natural. Setup requires Python 3.10 or later, ffmpeg (for joining the audio files), and espeak-ng (a speech processing library that Kokoro depends on). Both system tools are available through the standard package manager on Linux. An optional PyTorch installation with CUDA support is needed for GPU acceleration, with specific version requirements for older graphics card generations. The text cleaning step is tuned for one particular ebook export format, so users converting books from different sources may need to adjust the cleaning logic. The README also notes that users should not distribute audio made from copyrighted books.

Copy-paste prompts

Prompt 1

Help me install kokobook on Ubuntu including ffmpeg, espeak-ng, and PyTorch with CUDA support for my GPU. Walk me through the full setup from a fresh terminal.

Prompt 2

I am converting a Project Gutenberg book with kokobook but the text cleaning step leaves unwanted chapter headers in the audio. Help me adjust the cleaning logic in the Python script to strip them.

Prompt 3

I want to change the Kokoro voice and adjust the silence between sentences in kokobook. Show me which config parameters to set and what voice options are available.

Open on GitHub → Explain another repo

← arpecop on gitmyhub — every repo by this author, as a profile.

Verify against the repo before relying on details.