karpathy/llama2.c

★ 19,514CAudience · developerComplexity · 2/5StaleLicenseSetup · easy

Mindmap

mindmap
  root((repo))
    What it does
      Run AI models in C
      Generate text locally
      No dependencies needed
    How to use
      Download TinyLlama
      Export model weights
      Compile and run
    Tech stack
      C inference
      Python training
      PyTorch
    Use cases
      Learn AI internals
      Experiment locally
      Educational projects
    Performance
      110 tokens per second
      Runs on laptops

mindmap root((repo)) What it does Run AI models in C Generate text locally No dependencies needed How to use Download TinyLlama Export model weights Compile and run Tech stack C inference Python training PyTorch Use cases Learn AI internals Experiment locally Educational projects Performance 110 tokens per second Runs on laptops

Things people build with this

USE CASE 1

Run a small AI language model on your laptop without Python or heavy frameworks installed.

USE CASE 2

Learn how language model inference works by reading and modifying a single readable C file.

USE CASE 3

Train and experiment with tiny versions of Llama 2 for educational projects or prototyping.

Tech stack

CPythonPyTorch

Getting it running

Difficulty · easy Time to first run · 5min

MIT license, use freely for any purpose, including commercial, as long as you keep the copyright notice.

In plain English

llama2.c is a minimalist project that lets you run Llama 2, Meta's large language model AI, using a single file of plain C code with no external dependencies. The problem it solves is making AI language models approachable for learning and experimentation: instead of a huge complex codebase, you get one readable 700-line file that handles the inference (the "run the AI" part), plus PyTorch code for training smaller versions from scratch. The way it works is that you either download one of the pre-trained "TinyLlamas" (small models trained on short stories, ranging from 15M to 110M parameters) or export Meta's official Llama 2 weights into the project's format. You then compile and run the C file, which reads the model and generates text. It runs surprisingly fast, around 110 tokens per second on an M1 MacBook Air for the small models. You can give it a text prompt and it will continue the story or answer in kind. You'd use this if you want to understand how AI language models work at a low level, run a tiny AI locally without Python or heavy frameworks, or just experiment with text generation for educational purposes. The tech stack is C for inference, Python and PyTorch for training.

Copy-paste prompts

Prompt 1

Show me how to download a TinyLlama model and run it with llama2.c on my machine.

Prompt 2

Explain the key parts of the inference loop in llama2.c, how does it generate the next token?

Prompt 3

How do I export Meta's Llama 2 weights into the format llama2.c expects?

Prompt 4

Walk me through training a small custom language model with the PyTorch code in llama2.c.

Open on GitHub → Explain another repo

Generated 2026-05-18 · Model: sonnet-4-6 · Verify against the repo before relying on details.