gemma.cpp is a C++ program that runs Google's Gemma family of AI language models directly on a computer's processor, without needing a graphics card or cloud service. It is designed to be small and easy to understand, with a core implementation of around two thousand lines of code. The project targets researchers and developers who want to study or modify how a language model works at a low level. Most AI toolkits hide the computation behind layers of abstraction. gemma.cpp keeps things direct so that someone can read the code and see exactly what happens when the model generates text. It is not the recommended path for shipping a product, for that, Google points people toward standard Python-based tools. Supported models include Gemma 2, Gemma 3, and PaliGemma 2, covering sizes from 2 billion to 27 billion parameters. The program runs on Linux, Windows, and macOS, and works on any modern CPU. It achieves reasonable speed through a technique called SIMD, which lets the processor work on multiple numbers at once, adapting automatically to whatever hardware it finds. To use it, you download model weights from Kaggle (a data science platform), build the project with CMake (a common build tool), and run the resulting executable with a path to the weights file. A basic command-line interface lets you type prompts and read responses. There are also Python bindings so the engine can be called from Python code. The repository includes support for training as well as running models, which is less common in tools of this type. Contributions from the community are welcome, with active development happening on a separate branch from the stable release.
← google on gitmyhub — every repo by this author, as a profile.
Verify against the repo before relying on details.