blaizzy/mlx-audio

★ 7,043PythonAudience · developerComplexity · 3/5LicenseSetup · moderate

Mindmap

mindmap
  root((mlx-audio))
    Modes
      Text to speech
      Speech to text
      Speech to speech
    Features
      Voice cloning
      Real-time streaming
      Quantization
    Interfaces
      Python API
      CLI tool
      REST API
      Web UI
    Tech Stack
      Python
      MLX
      Apple Silicon
      Swift package

mindmap root((mlx-audio)) Modes Text to speech Speech to text Speech to speech Features Voice cloning Real-time streaming Quantization Interfaces Python API CLI tool REST API Web UI Tech Stack Python MLX Apple Silicon Swift package

Click or tap to explore — scroll the page freely

Things people build with this

USE CASE 1

Convert text to natural-sounding speech on a Mac using Python, the command line, or a REST API.

USE CASE 2

Transcribe spoken audio to text using Whisper running natively on an Apple Silicon Mac.

USE CASE 3

Clone a speaker's voice from a short audio sample and generate new speech in that voice.

USE CASE 4

Use this as a drop-in replacement for OpenAI's audio API endpoints in an existing project.

Tech stack

PythonMLXSwiftPyPI

Getting it running

Difficulty · moderate Time to first run · 30min

Requires an Apple Silicon Mac (M-series chip), not compatible with Intel Macs or other platforms.

Use freely for any purpose, including commercial projects, as long as you include the copyright notice.

In plain English

MLX-Audio is a Python library for working with speech on Apple Silicon Macs. It supports three modes: text-to-speech (converting written text into spoken audio), speech-to-text (transcribing spoken audio into written text), and speech-to-speech (transforming one piece of spoken audio into another). The library is built on MLX, Apple's machine learning framework designed specifically for the M-series chips found in modern Macs, so it runs efficiently on that hardware. The text-to-speech side supports a long list of models with different trade-offs between speed, quality, and language coverage. Options include Kokoro for fast multilingual output, OpenAI's Whisper model on the speech-to-text side, and models that support voice cloning, where the system learns the characteristics of a speaker from a short sample and then generates new speech in that voice. Speed of delivery can be adjusted, and the output audio can be streamed in real time rather than waiting for the entire file to generate. For developers, there is a Python API for embedding the functionality in code, a command-line tool for generating audio from a terminal command, and a REST API that is compatible with OpenAI's audio endpoints, making it a drop-in option for projects that already use that interface. A web interface with a 3D audio visualization is also included. The library supports quantization, which is a technique that compresses AI models to use less memory and run faster, with options ranging from 3-bit to 8-bit. There is also a Swift package for integrating the functionality into iOS and macOS apps. Installation is through pip, the standard Python package manager. The library is released under the MIT license and hosted on PyPI.

Copy-paste prompts

Prompt 1

Using mlx-audio on my M2 Mac, write Python code to convert a text string to an audio file using the Kokoro model.

Prompt 2

Show me how to transcribe an audio file to text using mlx-audio's Whisper integration in Python.

Prompt 3

How do I start mlx-audio's REST API server so it acts as a drop-in replacement for OpenAI's text-to-speech endpoint?

Prompt 4

Write a Python script that clones a voice from a 10-second audio sample using mlx-audio and generates new speech in that voice.

Prompt 5

How do I quantize an mlx-audio model to 4-bit to reduce memory usage on my Mac?

Open on GitHub → Explain another repo

← blaizzy on gitmyhub — every repo by this author, as a profile.

Verify against the repo before relying on details.