explaingit

exo-explore/exo

📈 Trending44,771PythonAudience · developerComplexity · 4/5ActiveLicenseSetup · hard

TLDR

Run large AI models locally by pooling computing power across multiple devices on your network, with no manual setup needed.

Mindmap

mindmap
  root((exo))
    What it does
      Splits models across devices
      Runs locally, no cloud
      Auto-discovers peers
    How it works
      Tensor parallelism
      Network communication
      RDMA for Apple Silicon
    Use cases
      Private AI inference
      Cost-effective scaling
      Local experimentation
    Tech stack
      Python
      MLX framework
      OpenAI API compatible
    Supported hardware
      Apple Silicon Macs
      Linux with GPUs
      Mixed clusters

Things people build with this

USE CASE 1

Run 70B+ parameter models on a cluster of personal devices without cloud costs or data privacy concerns.

USE CASE 2

Combine multiple Apple Silicon Macs via Thunderbolt for high-speed local AI inference.

USE CASE 3

Use existing OpenAI or Ollama client tools with your own hardware-based model cluster.

Tech stack

PythonMLXRDMATensor parallelism

Getting it running

Difficulty · hard Time to first run · 1day+

Requires network configuration, RDMA setup across multiple devices, and coordinating distributed tensor parallelism infrastructure.

Use freely for any purpose including commercial. Keep the notice and disclose changes to the patent grant.

In plain English

exo is a tool that lets you run large AI language models locally by pooling the computing resources of multiple devices you already own, turning a cluster of laptops, desktops, or servers into a single cooperative AI inference machine. The problem it solves is that the most capable AI models (like 70-billion or 600-billion parameter models) are too large to fit in the memory of a single consumer device. Cloud services can run them, but that costs money and sends your data to a remote server. exo lets you combine the memory and processing power of several personal devices to run these large models entirely on your own hardware. The software automatically discovers other devices on your network that are also running exo, no manual configuration is needed. When you send a prompt, exo splits (or "shards") the model across all available devices using a technique called tensor parallelism, where different parts of the model's computation happen simultaneously on different machines. The devices communicate the intermediate results of their computations with each other over the network. For Apple Silicon Macs connected via Thunderbolt cables, exo supports RDMA (Remote Direct Memory Access), a high-speed direct-memory transfer technique that dramatically reduces communication latency between devices. The API it exposes is compatible with OpenAI, Claude, and Ollama client formats, meaning you can use existing tools and applications with it without modification. You would use exo if you have multiple Apple Silicon Macs, Linux machines with GPUs, or any combination thereof and want to run powerful AI models locally for privacy, cost, or experimentation reasons. It is written in Python, uses Apple's MLX framework as the inference backend on Apple Silicon, and is installed by cloning the repository and running with the uv Python project manager.

Copy-paste prompts

Prompt 1
How do I set up exo to run a 70-billion parameter model across my two Apple Silicon Macs connected via Thunderbolt?
Prompt 2
Show me how to configure exo to auto-discover Linux GPU machines on my home network and pool them for inference.
Prompt 3
What's the fastest way to get exo running with the uv package manager and start serving OpenAI-compatible API requests?
Open on GitHub → Explain another repo

Generated 2026-05-18 · Model: sonnet-4-6 · Verify against the repo before relying on details.