audiohacking/audiogen.cpp

Analysis updated 2026-05-18

★ 4PythonAudience · developerComplexity · 4/5LicenseSetup · hard

Mindmap

mindmap
  root((audiogen.cpp))
    What it does
      Text to audio
      Speech music effects
      Local inference
    Tech stack
      C++ 17
      GGML engine
      GGUF weights
    Setup
      Build with make
      Download 3 to 6 GB models
      macOS Metal or CUDA
    Use cases
      Game audio assets
      Batch audio production
      Offline generation

mindmap root((audiogen.cpp)) What it does Text to audio Speech music effects Local inference Tech stack C++ 17 GGML engine GGUF weights Setup Build with make Download 3 to 6 GB models macOS Metal or CUDA Use cases Game audio assets Batch audio production Offline generation

Click or tap to explore — scroll the page freely

What do people build with it?

USE CASE 1

Generate sound effects for a game or video by typing a text description and getting a .wav file back.

USE CASE 2

Batch-produce ambient audio files from a list of text prompts using the built-in batch mode.

USE CASE 3

Run a large text-to-audio model entirely on your own machine without any cloud API costs.

USE CASE 4

Combine speech, music, and environment tags in one call to produce layered audio scenes.

What is it built with?

C++17GGMLPythonMetalCUDAGGUF

How does it compare?

	audiohacking/audiogen.cpp	adeliox/klein-head-swap	ats4321/ragit
Stars	4	4	4
Language	Python	Python	Python
Setup difficulty	hard	moderate	moderate
Complexity	4/5	3/5	2/5
Audience	developer	designer	developer

Figures from each repo's GitHub metadata at analysis time.

How do you get it running?

Difficulty · hard Time to first run · 1h+

Requires building from source with make and downloading model files of 3 to 6 GB.

Use freely for any purpose including commercial projects, as long as you keep the copyright and license notice.

In plain English

audiogen.cpp is a C++ program that lets you generate audio clips from text descriptions on your own computer. You type something like "a dog barking loudly" or "rain falling on a window," and it produces a .wav audio file matching that description. The AI model behind this is called Dasheng-AudioGen, a large two-billion-parameter model designed to produce speech, music, sound effects, and ambient sounds all from the same system. The project is built in C++17, which makes it run much faster than the original Python version. According to the benchmarks in the README, the C++ build is roughly 3.7 times faster than Python on Apple hardware. For a 10-second audio clip the C++ version finishes in about 6 seconds rather than 22 seconds. There are build options for Apple Metal on macOS, plain CPU processing, and CUDA for NVIDIA GPUs on Linux. Getting started requires cloning the repo, building it with a single make command, and downloading model files ranging from about 2.8 GB to 5.7 GB depending on the quality level you want. The models come in three sizes: full precision for best quality, Q8 at 33% smaller, and Q4 at 51% smaller for the smallest footprint. Once built, you run a command-line program pointing at those model files and supply a text description. The command-line tool accepts several input tags beyond a plain caption. You can layer speech, music, environmental sounds, and sound effects in a single call by using flags such as --speech, --music, --env, and --sfx alongside --caption. There is also a batch mode where you supply a text file of prompts and the program processes them all at once, writing individual .wav files to an output directory. The project is marked experimental. It is released under the Apache 2.0 license, which allows free use including commercial projects. The model weights are downloaded from Hugging Face in a format called GGUF, which is what the GGML inference engine reads.

Copy-paste prompts

Prompt 1

Show me how to build audiogen.cpp on a Mac with Apple Metal and download the Q8 model files.

Prompt 2

Write a shell script that reads sound descriptions from a text file and runs audiogen.cpp to produce a .wav file for each one.

Prompt 3

What do the --caption, --music, and --env flags do in audiogen.cpp and how do I combine them for a complex soundscape?

Prompt 4

What are the quality and speed tradeoffs between the F16, Q8, and Q4 model sizes in audiogen.cpp?

Prompt 5

How do I convert the original Dasheng-AudioGen Python weights to GGUF format instead of downloading the pre-converted ones?

Frequently asked questions

What is audiogen.cpp?

A C++ command-line tool that generates audio clips from text descriptions, running a 2-billion-parameter AI model locally at about 3.7x the speed of Python.

What language is audiogen.cpp written in?

Mainly Python. The stack also includes C++17, GGML, Python.

What license does audiogen.cpp use?

Use freely for any purpose including commercial projects, as long as you keep the copyright and license notice.

How hard is audiogen.cpp to set up?

Setup difficulty is rated hard, with roughly 1h+ to a first successful run.

Who is audiogen.cpp for?

Mainly developer.

Open on GitHub → Explain another repo

This repo across BitVibe Labs

Scan in gitsafehub Deploy in gitdeployhub audiohacking on gitmyhub

Verify against the repo before relying on details.