Analysis updated 2026-05-18
Run a benchmark across all your locally installed Ollama models to find which one is fastest on your hardware for a specific task type.
Use the local Judge feature to have one of your models score the outputs of the others, keeping all evaluation offline.
Export a ZIP of benchmark results to share evidence of model performance with a teammate or include in a project report.
| goekmenai/local-llm-matrix | a-bissell/unleash-lite | abhiinnovates/whatsapp-hr-assistant | |
|---|---|---|---|
| Stars | 1 | 1 | 1 |
| Language | Python | Python | Python |
| Setup difficulty | moderate | hard | hard |
| Complexity | 2/5 | 4/5 | 3/5 |
| Audience | developer | researcher | developer |
Figures from each repo's GitHub metadata at analysis time.
Requires Ollama installed with at least one model already pulled, macOS has guided double-click setup scripts, Linux and Windows require a manual developer start.
Local LLM Matrix is an app for comparing AI language models you have installed locally through Ollama, a popular tool for running AI on your own computer. The problem it addresses is that online benchmarks measure performance on reference hardware, while the same model may run much faster or slower on your specific machine, with your specific setup and quantization settings. The app runs a set of task-oriented prompts against any models you have installed, measures their speed, evaluates their output, and saves the results. An optional local Judge feature lets you pick one of your installed models to score the responses of the others, keeping the entire evaluation process on your own machine. Results are saved as JSON files, and a SQLite history index tracks performance over time so you can see whether a model consistently outperforms another or whether one result was an outlier. The interface has seven sections: an Overview showing current evidence and recommendations, Models and Sources separating local models from public reference data, a Test section for setting up benchmark runs, a Results section that explains confidence levels and skipped evaluations, a History section for trend analysis, an Analysis and Export section, and a Help section. A stated design principle is that public benchmark numbers and model cards are shown as context only and are never presented as measurements of your local model. The app does not download models automatically, you choose which models to pull in Ollama separately. The app supports German and English interfaces, dark and light themes, and responsive layouts. Results can be exported as Markdown, HTML, CSV, JSON, or ZIP. Setup on macOS involves double-clicking two provided scripts, Linux and Windows require a manual developer start. Built with Python and Streamlit. Apache 2.0 license.
A local Streamlit app for comparing Ollama-installed AI models on your own hardware, with task-based benchmarks, optional local Judge scoring, history tracking, and multi-format export.
Mainly Python. The stack also includes Python, Streamlit, Ollama.
Apache 2.0, use freely including commercially, with attribution, no warranty provided.
Setup difficulty is rated moderate, with roughly 30min to a first successful run.
Mainly developer.
This repo across BitVibe Labs
Verify against the repo before relying on details.