explaingit

goekmenai/local-llm-matrix

Analysis updated 2026-05-18

1PythonAudience · developerComplexity · 2/5LicenseSetup · moderate

TLDR

A local Streamlit app for comparing Ollama-installed AI models on your own hardware, with task-based benchmarks, optional local Judge scoring, history tracking, and multi-format export.

Mindmap

mindmap
  root((Local LLM Matrix))
    Core Purpose
      Compare local Ollama models
      Your hardware your results
    App Sections
      Overview recommendations
      Test benchmark plans
      Results confidence
      History trends
    Features
      Local Judge scoring
      Speed snapshots
      Public evidence context
      Multi-format export
    Principles
      Local first
      No auto downloads
      Uncertainty visible
Click or tap to explore — scroll the page freely

Code map

Detail Auto

An interactive map of this repo's files and how they connect — its source is parsed live in your browser. Click Visualize to build it.

filefunction / class

What do people build with it?

USE CASE 1

Run a benchmark across all your locally installed Ollama models to find which one is fastest on your hardware for a specific task type.

USE CASE 2

Use the local Judge feature to have one of your models score the outputs of the others, keeping all evaluation offline.

USE CASE 3

Export a ZIP of benchmark results to share evidence of model performance with a teammate or include in a project report.

What is it built with?

PythonStreamlitOllamaSQLite

How does it compare?

goekmenai/local-llm-matrixa-bissell/unleash-liteabhiinnovates/whatsapp-hr-assistant
Stars111
LanguagePythonPythonPython
Setup difficultymoderatehardhard
Complexity2/54/53/5
Audiencedeveloperresearcherdeveloper

Figures from each repo's GitHub metadata at analysis time.

How do you get it running?

Difficulty · moderate Time to first run · 30min

Requires Ollama installed with at least one model already pulled, macOS has guided double-click setup scripts, Linux and Windows require a manual developer start.

Apache 2.0, use freely including commercially, with attribution, no warranty provided.

In plain English

Local LLM Matrix is an app for comparing AI language models you have installed locally through Ollama, a popular tool for running AI on your own computer. The problem it addresses is that online benchmarks measure performance on reference hardware, while the same model may run much faster or slower on your specific machine, with your specific setup and quantization settings. The app runs a set of task-oriented prompts against any models you have installed, measures their speed, evaluates their output, and saves the results. An optional local Judge feature lets you pick one of your installed models to score the responses of the others, keeping the entire evaluation process on your own machine. Results are saved as JSON files, and a SQLite history index tracks performance over time so you can see whether a model consistently outperforms another or whether one result was an outlier. The interface has seven sections: an Overview showing current evidence and recommendations, Models and Sources separating local models from public reference data, a Test section for setting up benchmark runs, a Results section that explains confidence levels and skipped evaluations, a History section for trend analysis, an Analysis and Export section, and a Help section. A stated design principle is that public benchmark numbers and model cards are shown as context only and are never presented as measurements of your local model. The app does not download models automatically, you choose which models to pull in Ollama separately. The app supports German and English interfaces, dark and light themes, and responsive layouts. Results can be exported as Markdown, HTML, CSV, JSON, or ZIP. Setup on macOS involves double-clicking two provided scripts, Linux and Windows require a manual developer start. Built with Python and Streamlit. Apache 2.0 license.

Copy-paste prompts

Prompt 1
I have three Ollama models installed and want to compare them for coding tasks using Local LLM Matrix. Walk me through setting up and running the benchmark.
Prompt 2
Explain how the local Judge scoring works in Local LLM Matrix. Which model should I pick as Judge and how do I interpret the confidence scores?
Prompt 3
I ran a Local LLM Matrix benchmark and the results section shows some evaluations were skipped. What causes a skip and what should I do next?
Prompt 4
Walk me through the developer start command for Local LLM Matrix on Linux and how to run the automated test suite.

Frequently asked questions

What is local-llm-matrix?

A local Streamlit app for comparing Ollama-installed AI models on your own hardware, with task-based benchmarks, optional local Judge scoring, history tracking, and multi-format export.

What language is local-llm-matrix written in?

Mainly Python. The stack also includes Python, Streamlit, Ollama.

What license does local-llm-matrix use?

Apache 2.0, use freely including commercially, with attribution, no warranty provided.

How hard is local-llm-matrix to set up?

Setup difficulty is rated moderate, with roughly 30min to a first successful run.

Who is local-llm-matrix for?

Mainly developer.

Open on GitHub → Explain another repo

This repo across BitVibe Labs

Scan in gitsafehub Deploy in gitdeployhub goekmenai on gitmyhub

Verify against the repo before relying on details.