explaingit

amaai-lab/merit

19PythonAudience · researcherComplexity · 3/5LicenseSetup · moderate

TLDR

A Python tool that compares two audio files across melody, rhythm, and timbre separately, returning three independent similarity scores instead of a single blended number.

Mindmap

mindmap
  root((repo))
    What it does
      Compares audio
      Three scores
      Cosine similarity
    How it works
      MERT backbone
      Three modules
      128-dim embeddings
    Use cases
      Cover detection
      Remix analysis
      Music search
    Tech stack
      Python
      PyTorch
      HuggingFace
Click or tap to explore — scroll the page freely

Code map

Detail Auto

An interactive map of this repo's files and how they connect — its source is parsed live in your browser. Click Visualize to build it.

filefunction / class

Things people build with this

USE CASE 1

Compare a cover song against the original to get a score showing how closely the melody was preserved independent of instrument or style.

USE CASE 2

Analyze a remix to check whether it kept the same rhythmic feel as the source track while the timbre changed.

USE CASE 3

Build a music search tool that finds songs similar in timbre but different in melody or rhythm.

USE CASE 4

Evaluate AI-generated music against a reference track across all three musical qualities independently.

Tech stack

PythonPyTorchHuggingFace

Getting it running

Difficulty · moderate Time to first run · 30min

Requires Python with PyTorch, pre-trained model weights totalling about 33 MB must be downloaded from HuggingFace before first use.

Model code is MIT licensed, use freely for any purpose including commercial. Training dataset is non-commercial use only.

In plain English

MERIT is a Python tool for comparing audio files by three separate musical qualities: melody, rhythm, and timbre. Most music similarity systems return a single number that blends all these qualities together, making it impossible to tell why two songs scored as similar or different. MERIT separates them into three independent scores, so you can ask targeted questions like "does this cover share the same melody?" or "does this remix keep the same drum feel?" and get distinct answers for each. The system works by running audio through a shared backbone model called MERT, which was pre-trained on large amounts of music and knows how to convert raw audio into numerical representations. MERIT then passes those representations through three small trained modules, one for melody, one for rhythm, and one for timbre, each producing a 128-number embedding. When you compare two audio clips, each module computes its own cosine similarity score, a number between -1 and 1 indicating how closely that particular quality matches. A practical example from the README: if a solo piano plays a rock song note-for-note, the melody score will be high because the notes match, but the rhythm and timbre scores will be low because the piano phrasing and sound color differ from the original band. MERIT makes that distinction computable rather than subjective. The pre-trained model weights are freely available on HuggingFace and total about 33 MB. Setting it up requires Python with PyTorch and a few related libraries. You download the three small projection heads, load any audio file, and get back three embedding vectors that you can compare against other songs. The README includes ready-to-run Python code for this workflow. The training dataset, also available on HuggingFace, contains roughly 296,000 audio triplets where only one musical factor varies at a time, which is how the system learned to separate the three qualities. That dataset is for non-commercial use only. The model code itself is MIT licensed.

Copy-paste prompts

Prompt 1
Load cover.mp3 and original.mp3 with MERIT and print all three similarity scores, melody, rhythm, and timbre, to the console.
Prompt 2
I have a folder of 200 MP3 files. Help me loop through them with MERIT and find the 10 songs most similar in timbre to a reference track I specify.
Prompt 3
I am building a playlist generator that groups songs by rhythmic feel. Help me cluster my library using MERIT's rhythm embeddings with k-means.
Prompt 4
Explain what the 128-number embedding MERIT produces for melody represents and how cosine similarity turns two embeddings into a single score between -1 and 1.
Open on GitHub → Explain another repo

← amaai-lab on gitmyhub — every repo by this author, as a profile.

Verify against the repo before relying on details.