maximecb/bebelm

★ 19RustAudience · developerComplexity · 2/5Setup · easy

Mindmap

mindmap
  root((BebeLM))
    What it does
      CPU-only AI inference
      Chat and text generation
      Streaming token output
    Tech stack
      Rust
      LFM2.5-8B-A1B model
      Cargo package manager
    Use cases
      Offline AI chat
      Embed AI in Rust apps
      No GPU required
    Audience
      Developers
      Privacy-focused users
      Rust programmers
    Hardware support
      Apple M5
      AMD Ryzen
      Raspberry Pi

mindmap root((BebeLM)) What it does CPU-only AI inference Chat and text generation Streaming token output Tech stack Rust LFM2.5-8B-A1B model Cargo package manager Use cases Offline AI chat Embed AI in Rust apps No GPU required Audience Developers Privacy-focused users Rust programmers Hardware support Apple M5 AMD Ryzen Raspberry Pi

Click or tap to explore — scroll the page freely

Things people build with this

USE CASE 1

Chat with an AI assistant on your laptop without an internet connection or graphics card.

USE CASE 2

Add AI text generation to your own Rust application by importing BebeLM as a library.

USE CASE 3

Generate one-shot text completions from the command line for quick AI-assisted writing.

USE CASE 4

Run an AI model privately on low-power hardware like a Raspberry Pi.

Tech stack

RustLFM2.5-8B-A1BCargo

Getting it running

Difficulty · easy Time to first run · 30min

Install via Cargo with no extra system libraries needed. Download the single 5.2 GB model weights file, then run the CLI against it. Requires only the Rust toolchain.

License type not mentioned in the explanation.

In plain English

BebeLM is a program that runs an AI language model entirely on your CPU, written in Rust. Most AI tools that run large language models require a dedicated graphics card (GPU) with several gigabytes of video memory. BebeLM is built around a model called LFM2.5-8B-A1B, which has a design that keeps the number of calculations per generated word low enough that a regular desktop or laptop CPU can produce responses at a pace that feels usable in real time. The model has 8 billion parameters in total but only activates about 1 billion of them per step, which is what makes CPU-only inference feasible. You download a single file of model weights, roughly 5.2 gigabytes, and then run the tool against that file. The project has very few code dependencies and requires no extra system libraries beyond the Rust toolchain itself. There are two ways to use it. The command-line interface gives you a chat mode for back-and-forth conversation and a generate mode for one-shot text completions. The model can show its reasoning process as a separate block before giving its final answer, and you can cap how long that reasoning block runs if you want shorter responses. You can also install the binary directly via Cargo, the Rust package manager, without cloning the repository. Beyond the command-line tool, BebeLM is structured as a Rust library that other programs can import. The API lets you load the model once and run multiple conversations from the same loaded weights, pass a callback function to receive tokens as they are generated rather than waiting for the full response, and control sampling settings like temperature. The library handles all the low-level details of the model format and token handling. The project has been tested on Apple M5, AMD Ryzen, and AMD Threadripper processors. The README notes it should also work on Intel CPUs and Raspberry Pi 4 and 5, though those have not been verified.

Copy-paste prompts

Prompt 1

I'm using the BebeLM Rust library. Show me how to load the model once and run two separate conversations from the same loaded weights, receiving tokens via a callback as they stream in.

Prompt 2

Using BebeLM's CLI, how do I start a chat session and set a maximum length for the reasoning block so I get shorter responses?

Prompt 3

I want to install BebeLM via Cargo without cloning the repo. What command do I run and what file do I need to download to get started?

Prompt 4

Show me how to control the temperature sampling setting in BebeLM's Rust API to make the AI responses more creative or more focused.

Open on GitHub → Explain another repo

← maximecb on gitmyhub — every repo by this author, as a profile.

Verify against the repo before relying on details.