explaingit

ldolanldolan/coma-metabolomics

18PythonAudience · researcherComplexity · 3/5ActiveLicenseSetup · moderate

TLDR

Python tool for metabolomics researchers that reconciles disagreeing chemical annotations from multiple mass-spec tools across five levels of strictness and reports agreement with confidence labels.

Mindmap

mindmap
  root((COMA))
    Inputs
      R IPA outputs
      ipaPy2 outputs
      Mass spec annotations
    Outputs
      Agreement table
      Confidence labels
      Per-pair scores
    Use Cases
      Reconcile annotation tools
      Reproduce paper results
      Score metabolite matches
    Tech Stack
      Python
      Pip
      MIT license

Things people build with this

USE CASE 1

Cross-check metabolite IDs produced by R IPA and ipaPy2 on the same LC-MS dataset and keep only the high-confidence matches.

USE CASE 2

Reproduce the worked E. coli example from the linked Current Analytical Chemistry paper as a teaching exercise.

USE CASE 3

Score candidate annotations by HMDB or KEGG identifier match, molecular skeleton, formula, pathway, or name similarity.

USE CASE 4

Build a reproducible annotation-consensus step into a larger metabolomics pipeline written in Python.

Tech stack

PythonPip

Getting it running

Difficulty · moderate Time to first run · 30min

Package is alpha v0.1 and not on PyPI, so install is pip-from-source and only the R IPA and ipaPy2 readers are supported today.

MIT license, so you can use, modify, and redistribute the code freely including in commercial work, as long as you keep the copyright notice.

In plain English

COMA stands for Consensus Of Metabolite Annotations. It is a Python tool for researchers who study small chemical compounds found in living things, work known as metabolomics. When scientists run samples through a machine called a mass spectrometer, they get back lists of possible chemical matches, and different software programs often disagree about which match is correct. COMA tries to settle those disagreements in a structured way. The problem the project addresses is that two different annotation tools, given the same raw data, can produce different top guesses for what a chemical signal represents. Researchers today usually deal with this by picking one tool and ignoring the others, or by comparing the outputs by hand in spreadsheets, which is slow and hard to reproduce. COMA reads the outputs of these tools, lines them up, and reports where they agree. Agreement is judged at five levels of strictness. The strongest level is an exact match on a chemical's database identifier, such as an HMDB or KEGG code. Weaker levels include matching the chemical skeleton, matching the molecular formula within a small mass tolerance, sharing a metabolic pathway, or just having similar names. Each level produces a confidence score, and the final output is a flat table with one row per pair of guesses, labelled high, medium, or low confidence. The current version is marked alpha and called v0.1. It supports two reader formats, called R IPA and ipaPy2. The roadmap for v0.2 lists more readers for tools named SIRIUS, GNPS, and MetFrag, along with a smarter scoring model, a visualisation module, and HTML reports. Installation is from source via pip, and the package is on GitHub but not yet on PyPI. The licence is MIT. The repository is tied to a research paper by Lita Doolan at City St George's and King's College London, currently under review at Current Analytical Chemistry. An example E. coli dataset from that paper is included in the source tree as a worked example.

Copy-paste prompts

Prompt 1
Install coma-metabolomics from source into a fresh Python 3.11 venv and run the bundled E. coli example end to end. Show the exact pip commands.
Prompt 2
Write a small wrapper script that reads two annotation CSVs, one from R IPA and one from ipaPy2, and produces the COMA agreement table as a pandas DataFrame.
Prompt 3
Explain the five COMA agreement levels in plain English and suggest a confidence cutoff for a discovery-phase study versus a validation study.
Prompt 4
Sketch a v0.2 reader plugin for SIRIUS output that maps SIRIUS columns into the COMA internal schema. Keep it framework-free.
Prompt 5
Compare COMA against existing metabolomics consensus tools like MetaboAnnotatoR or RAMClustR in terms of inputs, outputs, and licence.
Open on GitHub → Explain another repo

Generated 2026-05-22 · Model: sonnet-4-6 · Verify against the repo before relying on details.