ericlbuehler/mistral.rs

★ 7,130RustAudience · developerComplexity · 3/5Setup · easy

Mindmap

mindmap
  root((mistral.rs))
    What it does
      Local AI model runner
      Multi-modal support
      Agentic tool calling
    Tech Stack
      Rust core
      Python package
      Hugging Face models
    Modalities
      Text chat
      Image understanding
      Audio and speech
      Video input
    Features
      Quantization
      Web chat UI
      MCP integration
    Use Cases
      Local AI assistant
      Python AI scripts
      Agent workflows

mindmap root((mistral.rs)) What it does Local AI model runner Multi-modal support Agentic tool calling Tech Stack Rust core Python package Hugging Face models Modalities Text chat Image understanding Audio and speech Video input Features Quantization Web chat UI MCP integration Use Cases Local AI assistant Python AI scripts Agent workflows

Click or tap to explore — scroll the page freely

Things people build with this

USE CASE 1

Run a large language model locally on your laptop with one command and chat with it through a built-in browser UI.

USE CASE 2

Call a locally running AI model from your Python code using the mistral.rs Python package as a programmable API.

USE CASE 3

Build an AI assistant that automatically calls external tools and APIs before returning its final answer.

Tech stack

RustPythonHugging FaceMCP

Getting it running

Difficulty · easy Time to first run · 30min

One-line install on Linux, macOS, or Windows, a GPU accelerates large models but is not strictly required.

In plain English

Mistral.rs is a tool for running AI language models on your own computer. It is built for speed and designed to work with models published on Hugging Face, the main public repository where AI researchers and companies share their models. You point the program at a model name and it handles the rest, detecting the model's format and starting it without requiring any configuration files. The tool supports far more than text conversations. The same engine handles text generation, image understanding, video input, audio, speech-to-text, image generation, and text embeddings. You can chat with a model through a built-in web interface by running a single command, or call the program from your own code using either a Python package or a Rust library. Because large AI models can require significant memory, mistral.rs includes detailed quantization support. Quantization is a technique that reduces model file size and memory usage at some cost to precision. The tool supports many quantization formats and can automatically benchmark your hardware and select the best settings for your specific machine. You can also control quantization settings on a per-layer basis if you need fine-grained control. The project includes what it calls agentic features, meaning the model can do more than generate text. It can call external tools, search the web, connect to external services via a standard protocol called MCP, and loop through multiple tool calls automatically before returning a final answer. These capabilities let you build AI assistants that interact with real systems rather than just producing text. The supported model list is extensive, covering dozens of well-known text, vision, and speech models. Installation is a one-line command on Linux, macOS, or Windows. The project is actively maintained, with recent updates adding new model families and quantization methods.

Copy-paste prompts

Prompt 1

Start mistral.rs with a Mistral 7B model from Hugging Face and open the built-in web chat UI to have a conversation.

Prompt 2

Use mistral.rs automatic quantization to run a large model on my laptop with limited RAM and benchmark the speed.

Prompt 3

Set up mistral.rs with MCP tool support so the model can call my custom REST API endpoints as part of its responses.

Prompt 4

Use the mistral.rs Python package to send an image and a text prompt together to a vision model and print the response.

Prompt 5

Configure per-layer quantization in mistral.rs to balance memory usage and output quality for my specific GPU.

Open on GitHub → Explain another repo

← ericlbuehler on gitmyhub — every repo by this author, as a profile.

Verify against the repo before relying on details.