thanhng8/omnivoice-tool

Analysis updated 2026-06-24

★ 11PythonAudience · vibe coderComplexity · 3/5Setup · moderate

Mindmap

mindmap
  root((omnivoice-tool))
    Inputs
      Text and spreadsheets
      Reference audio clips
      Voice design prompts
    Outputs
      Wav files per line
      Zipped bundles
      Streaming audio over WebSocket
    Use Cases
      Clone your own voice
      Bulk narrate scripts
      Build a TTS web client
      Generate multilingual voiceovers
    Tech Stack
      Python
      WebSocket
      OmniVoice

mindmap root((omnivoice-tool)) Inputs Text and spreadsheets Reference audio clips Voice design prompts Outputs Wav files per line Zipped bundles Streaming audio over WebSocket Use Cases Clone your own voice Bulk narrate scripts Build a TTS web client Generate multilingual voiceovers Tech Stack Python WebSocket OmniVoice

Click or tap to explore — scroll the page freely

What do people build with it?

USE CASE 1

Clone a friend's voice from a 10-second clip and read a script back in it

USE CASE 2

Batch generate hundreds of narrated lines from a spreadsheet and download a zip

USE CASE 3

Add laughter and sigh markers to a generated voice line for a game character

USE CASE 4

Wire the WebSocket stream into a Chrome extension or Node.js client

What is it built with?

PythonWebSocketOmniVoice

How does it compare?

	thanhng8/omnivoice-tool	2arons/llm-cli	an1x3r/anima-artist-mixer
Stars	11	11	11
Language	Python	Python	Python
Setup difficulty	moderate	easy	easy
Complexity	3/5	2/5	2/5
Audience	vibe coder	developer	designer

Figures from each repo's GitHub metadata at analysis time.

How do you get it running?

Difficulty · moderate Time to first run · 30min

First run downloads the OmniVoice model, and GPU mode needs a working CUDA install.

In plain English

OmniVoice TTS Tool is an add-on that wraps an existing open-source text-to-speech engine called OmniVoice (made by another team, k2-fsa) and turns it into something easier to use on your own computer. Everything in this repository lives in a folder called tool/ and is not part of the original upstream project, so it is best to think of it as a friendly shell around someone else's voice generation model. The core piece is a local server, written in Python, that loads OmniVoice once when it starts and then exposes two things on a single port (8765): a normal web page you can open in your browser, and a WebSocket connection that streams generated audio. There are launcher scripts for Windows and for macOS or Linux that pop up a small numbered menu, letting you pick between running on a GPU with speech recognition, GPU only, CPU, a custom port, or advanced settings. After the first run, an offline flag keeps it from checking the network, so it starts in three to five seconds. The browser interface follows a three-step flow: pick a voice, write the text, then generate. You can choose Auto Voice (a random voice each time), Voice Clone (pick a saved voice or upload three to ten seconds of your own reference audio), or Voice Design (describe gender, age, pitch, accent, and dialect). The author says 646 languages are supported, with a prebuilt gallery of 45 curated voices covering English, Chinese, and Vietnamese. Every generation knob the underlying model offers is exposed as a slider or input, and you can insert non-verbal markers like laughter or sighs. Bulk import from spreadsheets and text files is supported, and output comes out as per-line wav files plus a zip bundle. Example client code is provided for browser, Chrome extension, Node.js, and Python.

Copy-paste prompts

Prompt 1

Walk me through running omnivoice-tool on macOS, picking the CPU launcher menu option, and opening the page on port 8765

Prompt 2

Show me how to upload a reference audio clip in omnivoice-tool and generate a voice clone line

Prompt 3

Write a Node.js client that connects to the omnivoice-tool WebSocket and saves the streamed wav to disk

Prompt 4

Use omnivoice-tool Voice Design to make a young Vietnamese female voice with a higher pitch

Prompt 5

Import a CSV of 200 lines into omnivoice-tool and download the per-line wav zip

Frequently asked questions

What is omnivoice-tool?

Local wrapper around the OmniVoice TTS engine. Runs a Python server on port 8765 with a browser UI for voice cloning, voice design, and bulk text-to-speech generation.

What language is omnivoice-tool written in?

Mainly Python. The stack also includes Python, WebSocket, OmniVoice.

How hard is omnivoice-tool to set up?

Setup difficulty is rated moderate, with roughly 30min to a first successful run.

Who is omnivoice-tool for?

Mainly vibe coder.

Open on GitHub → Explain another repo

This repo across BitVibe Labs

Verify against the repo before relying on details.