Clone someone's voice from a short recording and generate new spoken text in that voice.
Re-dub an existing audio clip so it plays back in a different person's voice.
Produce speech in a cloned voice across 16 supported languages including English, Chinese, and Japanese.
Requires downloading ~3GB of model files from Hugging Face, users in China need a working proxy to reach Hugging Face.
Clone Voice is a voice cloning tool with a browser-based interface that lets you take a short audio recording of any person's voice and use it to generate new speech. You can either type text and have it spoken in the cloned voice, or take an existing audio clip and re-produce it in that voice. The README is written primarily in Chinese, with an English version linked separately. The tool is built on a speech synthesis model called xtts_v2, developed by coqui.ai, which is licensed for personal learning and research only, not for commercial use. It supports 16 languages including Chinese, English, Japanese, Korean, French, German, and Italian. The README notes that English output quality is good and Chinese quality is acceptable. For Windows users, a precompiled version is available as a downloadable package. You double-click an executable file, wait for a web page to open automatically, and then use the interface by clicking through the options. The model files, which are roughly 3 gigabytes, need to be downloaded and placed in a specific folder. No coding is required for the precompiled path. For users on Linux or macOS, or those who want to run from source, the process involves Python 3.9 through 3.11, setting up a virtual environment, installing dependencies, and downloading the model files from Hugging Face, which requires a working proxy connection for users in China since those services are blocked there. The README includes detailed troubleshooting notes for proxy-related failures, which it identifies as the most common source of errors. If the machine has an Nvidia GPU, CUDA acceleration can be enabled for faster processing. The same developer also maintains related tools for video translation with dubbing, speech-to-text transcription, and vocal separation from background music.
← jianchang512 on gitmyhub — every repo by this author, as a profile.
Verify against the repo before relying on details.