explaingit

thanhng8/omnivoice-tool

11PythonAudience · vibe coderComplexity · 3/5ActiveSetup · moderate

TLDR

Local wrapper around the OmniVoice TTS engine. Runs a Python server on port 8765 with a browser UI for voice cloning, voice design, and bulk text-to-speech generation.

Mindmap

mindmap
  root((omnivoice-tool))
    Inputs
      Text and spreadsheets
      Reference audio clips
      Voice design prompts
    Outputs
      Wav files per line
      Zipped bundles
      Streaming audio over WebSocket
    Use Cases
      Clone your own voice
      Bulk narrate scripts
      Build a TTS web client
      Generate multilingual voiceovers
    Tech Stack
      Python
      WebSocket
      OmniVoice

Things people build with this

USE CASE 1

Clone a friend's voice from a 10-second clip and read a script back in it

USE CASE 2

Batch generate hundreds of narrated lines from a spreadsheet and download a zip

USE CASE 3

Add laughter and sigh markers to a generated voice line for a game character

USE CASE 4

Wire the WebSocket stream into a Chrome extension or Node.js client

Tech stack

PythonWebSocketOmniVoice

Getting it running

Difficulty · moderate Time to first run · 30min

First run downloads the OmniVoice model, and GPU mode needs a working CUDA install.

In plain English

OmniVoice TTS Tool is an add-on that wraps an existing open-source text-to-speech engine called OmniVoice (made by another team, k2-fsa) and turns it into something easier to use on your own computer. Everything in this repository lives in a folder called tool/ and is not part of the original upstream project, so it is best to think of it as a friendly shell around someone else's voice generation model. The core piece is a local server, written in Python, that loads OmniVoice once when it starts and then exposes two things on a single port (8765): a normal web page you can open in your browser, and a WebSocket connection that streams generated audio. There are launcher scripts for Windows and for macOS or Linux that pop up a small numbered menu, letting you pick between running on a GPU with speech recognition, GPU only, CPU, a custom port, or advanced settings. After the first run, an offline flag keeps it from checking the network, so it starts in three to five seconds. The browser interface follows a three-step flow: pick a voice, write the text, then generate. You can choose Auto Voice (a random voice each time), Voice Clone (pick a saved voice or upload three to ten seconds of your own reference audio), or Voice Design (describe gender, age, pitch, accent, and dialect). The author says 646 languages are supported, with a prebuilt gallery of 45 curated voices covering English, Chinese, and Vietnamese. Every generation knob the underlying model offers is exposed as a slider or input, and you can insert non-verbal markers like laughter or sighs. Bulk import from spreadsheets and text files is supported, and output comes out as per-line wav files plus a zip bundle. Example client code is provided for browser, Chrome extension, Node.js, and Python.

Copy-paste prompts

Prompt 1
Walk me through running omnivoice-tool on macOS, picking the CPU launcher menu option, and opening the page on port 8765
Prompt 2
Show me how to upload a reference audio clip in omnivoice-tool and generate a voice clone line
Prompt 3
Write a Node.js client that connects to the omnivoice-tool WebSocket and saves the streamed wav to disk
Prompt 4
Use omnivoice-tool Voice Design to make a young Vietnamese female voice with a higher pitch
Prompt 5
Import a CSV of 200 lines into omnivoice-tool and download the per-line wav zip
Open on GitHub → Explain another repo

Generated 2026-05-22 · Model: sonnet-4-6 · Verify against the repo before relying on details.