Browse Hugging Face and download GGUF model files with one click
Save per-model templates with thread count, batch size, and context length defaults
Run several models at once on different ports in API-only or chat mode
Switch between llama.cpp builds from a settings panel without recompiling
Pre-built installer works out of the box; source build needs Node.js 18 or newer.
Hexllama is a desktop application that gives you a friendly graphical interface for running large language models on your own computer. The underlying engine it talks to is llama.cpp, a popular open-source program that loads and runs AI models locally. The pitch is that you normally have to fight with terminal commands and config flags to use llama.cpp, and Hexllama replaces all of that with buttons, menus, and forms. The app has a built-in browser for Hugging Face, which is a website where people share AI model files. From inside Hexllama you can search for a model, look at the files in a repository, and download GGUF model files with one click. There is a download manager that lets you pause, resume, or cancel large downloads, and you can also paste a direct link to a GGUF file. When a download finishes, the app creates a starter configuration with reasonable defaults for things like thread count, batch size, and context length, based on the model you grabbed. You can save these configurations as templates and reuse them. Multiple models can be run at the same time on different network ports without colliding. Each template can be launched in a chat mode that opens the llama.cpp web interface in your browser, or in API-only mode that runs the model silently so other tools can connect to it. There is also a visual editor for command-line flags, so you tick checkboxes and set numbers instead of memorizing arguments. Hexllama can manage different builds of the llama.cpp engine itself, check the ggml-org repository for new releases, and switch between versions from a settings panel. Installation is either a pre-built installer from the Releases page, or cloning the repo and running npm install and npm run dev with Node.js 18 or newer. The project is built with Electron, React, TypeScript, and Vite. It stores no telemetry and runs fully local, though downloads still go through Hugging Face. The README also lists a roadmap covering a native chat UI, speculative decoding, multi-language support, and future backends like MLX and vLLM.
Generated 2026-05-22 · Model: sonnet-4-6 · Verify against the repo before relying on details.