explaingit

menelly/ai_ears

Analysis updated 2026-05-18

8PythonAudience · developerComplexity · 3/5Setup · moderate

TLDR

An MCP server that gives AI assistants rich audio analysis, combining speech-to-text transcription with acoustic properties like voice emotion, musical key, pace, and breath detection.

Mindmap

mindmap
  root((ace-ears))
    What it does
      Speech transcription
      Voice profiling
      Acoustic analysis
      Breath detection
    Output
      Summary card
      Raw structured data
    STT providers
      Inworld cloud
      ElevenLabs cloud
      Local Whisper offline
    Setup
      pip install requirements
      ffmpeg on PATH
      MCP config block
Click or tap to explore — scroll the page freely

Code map

Detail Auto

An interactive map of this repo's files and how they connect — its source is parsed live in your browser. Click Visualize to build it.

filefunction / class

What do people build with it?

USE CASE 1

Connect ace-ears to Claude so it can analyze a voice recording and describe how the speaker sounded, not just what they said.

USE CASE 2

Use the local Whisper mode for fully offline audio analysis with no data sent to any external service.

USE CASE 3

Call the hear_raw tool from an AI assistant to get structured acoustic data from an audio file for programmatic processing.

What is it built with?

PythonNumPyFFmpegMCP

How does it compare?

menelly/ai_earsadam-s/car-diagnosisbongobongo2020/krea2-character-lora-trainer
Stars888
LanguagePythonPythonPython
Setup difficultymoderatemoderatemoderate
Complexity3/53/53/5
Audiencedeveloperresearchervibe coder

Figures from each repo's GitHub metadata at analysis time.

How do you get it running?

Difficulty · moderate Time to first run · 30min

Requires FFmpeg on PATH, cloud STT providers need an API key, but local mode (faster-whisper) works fully offline.

No license information found in the README.

In plain English

ace-ears is a small server that lets an AI assistant like Claude actually analyze an audio file rather than just read a transcript of it. Standard speech-to-text tools convert spoken words into text and discard everything else. This tool keeps the rest: the speaking style, the detected emotion, the pace, pauses, breath sounds, the musical key if music is present, the dynamic range, and the spectral character of the sound. The output is a summary card that combines two sources of information. The words and speaking characteristics come from a speech-to-text service, with three options: a cloud service called Inworld that also provides voice profiling, ElevenLabs which adds audio event tags, or a fully local offline transcription option using a model called faster-whisper. The acoustic analysis, which covers frequency brightness, musical key, tempo, dynamic range, and breath detection, runs entirely on your machine using standard math tools, with no external service and no API key required. The server exposes two tools to an AI assistant. One returns a human-readable summary card showing the words, voice characteristics, pacing, sound properties, and breath timestamps in a structured text format. The other returns the raw structured data for the AI to process programmatically. Setup requires Python, the Python packages listed in the requirements file, and FFmpeg installed on your system. Configuration is through a small environment file where you choose your speech-to-text provider and enter any required API key. For a completely offline setup with no data leaving your machine, the local Whisper option needs no key and no network. The server registers as an MCP tool so it can be called directly from Claude Desktop or other MCP-compatible clients. There is also a command-line interface that runs the same analysis from a terminal. The license is not stated in the README.

Copy-paste prompts

Prompt 1
I have ace-ears set up as an MCP server. Listen to this recording and describe both what the person said and how they sounded, including any notable pauses or emotional tone.
Prompt 2
Use ace-ears to analyze this song file and tell me the detected musical key, tempo, and dynamic range.
Prompt 3
I want to use ace-ears in local offline mode with no API key. Walk me through setting up STT_PROVIDER=local with faster-whisper and registering it in my Claude Desktop config.

Frequently asked questions

What is ai_ears?

An MCP server that gives AI assistants rich audio analysis, combining speech-to-text transcription with acoustic properties like voice emotion, musical key, pace, and breath detection.

What language is ai_ears written in?

Mainly Python. The stack also includes Python, NumPy, FFmpeg.

What license does ai_ears use?

No license information found in the README.

How hard is ai_ears to set up?

Setup difficulty is rated moderate, with roughly 30min to a first successful run.

Who is ai_ears for?

Mainly developer.

Open on GitHub → Explain another repo

This repo across BitVibe Labs

Scan in gitsafehub Deploy in gitdeployhub menelly on gitmyhub

Verify against the repo before relying on details.