explaingit

netease-youdao/emotivoice

8,480PythonAudience · developerComplexity · 3/5LicenseSetup · hard

TLDR

EmotiVoice is an open-source text-to-speech system that generates expressive speech in English and Chinese with control over emotion, happy, sad, angry, across more than 2,000 voices.

Mindmap

mindmap
  root((repo))
    What it does
      Text to speech
      Emotion control
      2000 plus voices
    Features
      English and Chinese
      OpenAI API compatible
      Voice cloning
    Tech Stack
      Python
      Docker
      CUDA
    Audience
      Content creators
      App developers
Click or tap to explore — scroll the page freely

Code map

Detail Auto

An interactive map of this repo's files and how they connect — its source is parsed live in your browser. Click Visualize to build it.

filefunction / class

Things people build with this

USE CASE 1

Generate emotional voice narration for a game character or interactive story application

USE CASE 2

Build a content pipeline that converts text to expressive speech in English or Chinese

USE CASE 3

Replace an OpenAI TTS integration with a self-hosted alternative using the same API format

Tech stack

PythonDockerCUDAconda

Getting it running

Difficulty · hard Time to first run · 1h+

Requires an NVIDIA GPU for local inference, Docker is the easiest path but still needs GPU access.

Free to use for any purpose, including commercial use, with attribution under Apache 2.0.

In plain English

EmotiVoice is an open-source text-to-speech system that can generate spoken audio from text in both English and Chinese, with control over the emotional tone of the output. When you give it text, you also specify an emotion such as happy, sad, angry, or excited, and the system generates speech that reflects that emotion rather than neutral, flat delivery. It offers more than 2,000 distinct voices to choose from. The most distinctive aspect compared to basic text-to-speech tools is the prompt-controlled emotion feature. Instead of just converting words to audio, you tell the system how you want the speaker to sound, and it adjusts pitch, speed, and energy accordingly. This makes it useful for content creators, game developers, or anyone building applications that need expressive rather than robotic-sounding speech. There are several ways to use it. The quickest is through a Docker container: you pull a pre-built image, run it, and access a web interface in your browser. A full local installation uses Python with conda and pip. The system also exposes an API that is compatible with the OpenAI text-to-speech API format, meaning software already built to use OpenAI's speech service could switch to EmotiVoice with minimal changes. A Mac desktop app was also released as a download. Voice cloning is supported, allowing users to fine-tune the system on their own audio recordings to produce speech in a custom voice. A GPU is required for inference, specifically an NVIDIA GPU when running locally or via Docker. The project was created by Netease Youdao and is released under the Apache 2.0 license.

Copy-paste prompts

Prompt 1
Show me how to run EmotiVoice in Docker and generate speech that sounds excited for a product announcement script
Prompt 2
Help me use EmotiVoice's OpenAI-compatible API to replace my current TTS setup with minimal code changes
Prompt 3
How do I fine-tune EmotiVoice on my own voice recordings to clone my voice for audio content?
Open on GitHub → Explain another repo

← netease-youdao on gitmyhub — every repo by this author, as a profile.

Verify against the repo before relying on details.