Analysis updated 2026-05-18
Add emry to a PyTorch or JAX training loop to stream loss and learning rate metrics to a live terminal dashboard during long runs.
Compare two training experiments by overlaying a prior run as a baseline using emry watch --compare or emry web with a baseline run directory.
Set up Emry in sidecar mode on a SLURM cluster so metric logging outlives the training process if it is killed.
Export training history to CSV and analyze it with pandas after the run completes.
| femboyisp/emry | j0rdiun/cosmic-ext-app-switcher | jangia/jg-lint | |
|---|---|---|---|
| Stars | 6 | 6 | 6 |
| Language | Rust | Rust | Rust |
| Setup difficulty | easy | moderate | easy |
| Complexity | 2/5 | 2/5 | 3/5 |
| Audience | data | general | developer |
Figures from each repo's GitHub metadata at analysis time.
Requires Python 3.10+ and pip, Rust toolchain is only needed to build from source, not for the PyPI install.
Emry is a monitoring tool for machine learning training runs. When you train a model that takes hours or days to complete, you want to watch metrics like loss and learning rate without that monitoring code slowing down the training itself. Emry is designed around this constraint: the core emit() call is meant to take under 10 microseconds and never blocks the training loop. The setup is minimal. Your training loop calls emry.run() to start a named run, then calls run.emit() each step with whatever metrics you care about as keyword arguments. Emry handles the rest: it writes an append-only log file you can read with standard tools like jq or pandas, and if you are running in a terminal it brings up a live dashboard automatically. The dashboard shows a loss curve, phase markers, checkpoint annotations, and an optional overlay of a prior run so you can compare current performance against a baseline. Two dashboards are included. The terminal version (emry watch) runs in any standard terminal. The web version (emry web) serves a local browser dashboard that works fully air-gapped with no CDN dependency. Both have the same feature set. For cluster environments like SLURM, a sidecar mode lets the monitoring engine outlive the training process if the process crashes or is killed. Emry also automatically samples GPU utilization, memory, and temperature via nvidia-smi when a GPU is present, and can send a Slack or Discord alert if a metric goes NaN or infinity. The core engine is written in Rust for performance, the Python API wraps it via the maturin build tool. Installing it is a single pip install with no account or external service required. This is for machine learning researchers and engineers who run long training jobs and want lightweight, self-hosted observability that is easy to read both live and after the fact.
A lightweight, self-hosted monitoring tool for ML training runs: a single emit() call streams metrics to an append-only log and a live terminal or web dashboard with no accounts or external services.
Mainly Rust. The stack also includes Rust, Python, maturin.
Free to use for any purpose, including commercial use, with no restrictions beyond keeping the license notice.
Setup difficulty is rated easy, with roughly 5min to a first successful run.
Mainly data.
This repo across BitVibe Labs
Verify against the repo before relying on details.