explaingit

getumbrel/llama-gpt

10,965TypeScriptAudience · generalComplexity · 3/5Setup · moderate

TLDR

Self-hosted ChatGPT-like chatbot powered by Llama 2 that runs entirely on your own machine, no account, no cloud, no data leaving your device.

Mindmap

mindmap
  root((LlamaGPT))
    What it does
      Private AI chat
      No data sent out
      ChatGPT-like UI
    Models
      Llama 2
      Code Llama
      Multiple sizes
    Install options
      Umbrel one-click
      Docker
      Mac shell script
      Kubernetes
    Audience
      Privacy-conscious users
      Home lab owners
Click or tap to explore — scroll the page freely

Code map

Detail Auto

An interactive map of this repo's files and how they connect — its source is parsed live in your browser. Click Visualize to build it.

filefunction / class

Things people build with this

USE CASE 1

Run a private AI chat assistant on your home server or Mac so sensitive conversations never leave your machine.

USE CASE 2

Connect existing tools built for the OpenAI API to a locally running Llama 2 model using LlamaGPT's compatible API.

USE CASE 3

Install a conversational AI assistant on a Raspberry Pi or home lab server through the Umbrel one-click platform.

Tech stack

TypeScriptDockerLlama 2KubernetesUmbrel

Getting it running

Difficulty · moderate Time to first run · 30min

Requires Docker or the Umbrel platform, also needs sufficient RAM and disk space, the smallest model needs 6GB RAM and a 3.79GB download.

License not specified in the explanation.

In plain English

LlamaGPT is a self-hosted chatbot you can run on your own computer or home server. It works like a conversational AI assistant, but unlike cloud services, all the processing happens on your device and no data leaves your machine. It is powered by Llama 2, an open source language model from Meta, and also supports Code Llama models for programming-related questions. The interface looks similar to ChatGPT. You type a message, the model generates a response, and you continue the conversation from there. Because everything runs locally, there is no account to create and no data sent to external servers. The tradeoff is that your hardware does the work, so slower machines produce text more slowly. A Raspberry Pi with 8GB of RAM generates roughly 0.9 words per second with the smallest model, while an M1 Max MacBook Pro generates about 54 words per second. You can choose from several model sizes. Larger models generally give better answers but require more memory and disk space. The smallest option needs about 6GB of RAM and a 3.79GB download. The largest needs 41GB of RAM and a 38.87GB download. The README includes a full benchmark table showing generation speeds across different hardware. Installation options cover a few scenarios: one-click install through the Umbrel home server platform, a shell script for M1 and M2 Macs, a Docker-based setup for any other machine, and a Kubernetes deployment for more advanced infrastructure. The project also exposes an API compatible with the OpenAI format, meaning tools built for that API can connect to it instead. The project was built by Umbrel, a company that makes home server software. It is aimed at people who want a private AI assistant without relying on hosted services.

Copy-paste prompts

Prompt 1
Help me install LlamaGPT on my M1 Mac using the shell script method and pick the right model size for 16GB of RAM.
Prompt 2
Show me how to connect my Python script that uses the OpenAI SDK to LlamaGPT's local API instead.
Prompt 3
I want to run LlamaGPT on a Linux server with Docker, walk me through the full setup including choosing a model and setting memory limits.
Prompt 4
Compare the LlamaGPT model sizes: which one gives the best balance of response quality and speed on a machine with 16GB RAM?
Open on GitHub → Explain another repo

← getumbrel on gitmyhub — every repo by this author, as a profile.

Verify against the repo before relying on details.