Analysis updated 2026-05-18
Connect ace-ears to Claude so it can analyze a voice recording and describe how the speaker sounded, not just what they said.
Use the local Whisper mode for fully offline audio analysis with no data sent to any external service.
Call the hear_raw tool from an AI assistant to get structured acoustic data from an audio file for programmatic processing.
| menelly/ai_ears | adam-s/car-diagnosis | bongobongo2020/krea2-character-lora-trainer | |
|---|---|---|---|
| Stars | 8 | 8 | 8 |
| Language | Python | Python | Python |
| Setup difficulty | moderate | moderate | moderate |
| Complexity | 3/5 | 3/5 | 3/5 |
| Audience | developer | researcher | vibe coder |
Figures from each repo's GitHub metadata at analysis time.
Requires FFmpeg on PATH, cloud STT providers need an API key, but local mode (faster-whisper) works fully offline.
ace-ears is a small server that lets an AI assistant like Claude actually analyze an audio file rather than just read a transcript of it. Standard speech-to-text tools convert spoken words into text and discard everything else. This tool keeps the rest: the speaking style, the detected emotion, the pace, pauses, breath sounds, the musical key if music is present, the dynamic range, and the spectral character of the sound. The output is a summary card that combines two sources of information. The words and speaking characteristics come from a speech-to-text service, with three options: a cloud service called Inworld that also provides voice profiling, ElevenLabs which adds audio event tags, or a fully local offline transcription option using a model called faster-whisper. The acoustic analysis, which covers frequency brightness, musical key, tempo, dynamic range, and breath detection, runs entirely on your machine using standard math tools, with no external service and no API key required. The server exposes two tools to an AI assistant. One returns a human-readable summary card showing the words, voice characteristics, pacing, sound properties, and breath timestamps in a structured text format. The other returns the raw structured data for the AI to process programmatically. Setup requires Python, the Python packages listed in the requirements file, and FFmpeg installed on your system. Configuration is through a small environment file where you choose your speech-to-text provider and enter any required API key. For a completely offline setup with no data leaving your machine, the local Whisper option needs no key and no network. The server registers as an MCP tool so it can be called directly from Claude Desktop or other MCP-compatible clients. There is also a command-line interface that runs the same analysis from a terminal. The license is not stated in the README.
An MCP server that gives AI assistants rich audio analysis, combining speech-to-text transcription with acoustic properties like voice emotion, musical key, pace, and breath detection.
Mainly Python. The stack also includes Python, NumPy, FFmpeg.
No license information found in the README.
Setup difficulty is rated moderate, with roughly 30min to a first successful run.
Mainly developer.
This repo across BitVibe Labs
Verify against the repo before relying on details.