exo-explore/exo

Analysis updated 2026-05-18

★ 44,382PythonAudience · developerComplexity · 4/5LicenseSetup · hard

Mindmap

mindmap
  root((exo))
    What it does
      Splits models across devices
      Runs locally, no cloud
      Auto-discovers peers
    How it works
      Tensor parallelism
      Network communication
      RDMA for Apple Silicon
    Use cases
      Private AI inference
      Cost-effective scaling
      Local experimentation
    Tech stack
      Python
      MLX framework
      OpenAI API compatible
    Supported hardware
      Apple Silicon Macs
      Linux with GPUs
      Mixed clusters

mindmap root((exo)) What it does Splits models across devices Runs locally, no cloud Auto-discovers peers How it works Tensor parallelism Network communication RDMA for Apple Silicon Use cases Private AI inference Cost-effective scaling Local experimentation Tech stack Python MLX framework OpenAI API compatible Supported hardware Apple Silicon Macs Linux with GPUs Mixed clusters

Click or tap to explore — scroll the page freely

What do people build with it?

USE CASE 1

Run 70B+ parameter models on a cluster of personal devices without cloud costs or data privacy concerns.

USE CASE 2

Combine multiple Apple Silicon Macs via Thunderbolt for high-speed local AI inference.

USE CASE 3

Use existing OpenAI or Ollama client tools with your own hardware-based model cluster.

What is it built with?

PythonMLXRDMATensor parallelism

How does it compare?

	exo-explore/exo	aider-ai/aider	simplifyjobs/summer2026-internships
Stars	44,382	44,411	44,458
Language	Python	Python	Python
Setup difficulty	hard	easy	easy
Complexity	4/5	3/5	1/5
Audience	developer	developer	general

Figures from each repo's GitHub metadata at analysis time.

How do you get it running?

Difficulty · hard Time to first run · 1day+

Requires network configuration, RDMA setup across multiple devices, and coordinating distributed tensor parallelism infrastructure.

Use freely for any purpose including commercial. Keep the notice and disclose changes to the patent grant.

In plain English

exo is a tool that lets you run large AI language models locally by pooling the computing resources of multiple devices you already own, turning a cluster of laptops, desktops, or servers into a single cooperative AI inference machine. The problem it solves is that the most capable AI models (like 70-billion or 600-billion parameter models) are too large to fit in the memory of a single consumer device. Cloud services can run them, but that costs money and sends your data to a remote server. exo lets you combine the memory and processing power of several personal devices to run these large models entirely on your own hardware. The software automatically discovers other devices on your network that are also running exo, no manual configuration is needed. When you send a prompt, exo splits (or "shards") the model across all available devices using a technique called tensor parallelism, where different parts of the model's computation happen simultaneously on different machines. The devices communicate the intermediate results of their computations with each other over the network. For Apple Silicon Macs connected via Thunderbolt cables, exo supports RDMA (Remote Direct Memory Access), a high-speed direct-memory transfer technique that dramatically reduces communication latency between devices. The API it exposes is compatible with OpenAI, Claude, and Ollama client formats, meaning you can use existing tools and applications with it without modification. You would use exo if you have multiple Apple Silicon Macs, Linux machines with GPUs, or any combination thereof and want to run powerful AI models locally for privacy, cost, or experimentation reasons. It is written in Python, uses Apple's MLX framework as the inference backend on Apple Silicon, and is installed by cloning the repository and running with the uv Python project manager.

Copy-paste prompts

Prompt 1

How do I set up exo to run a 70-billion parameter model across my two Apple Silicon Macs connected via Thunderbolt?

Prompt 2

Show me how to configure exo to auto-discover Linux GPU machines on my home network and pool them for inference.

Prompt 3

What's the fastest way to get exo running with the uv package manager and start serving OpenAI-compatible API requests?

Frequently asked questions

What is exo?

Run large AI models locally by pooling computing power across multiple devices on your network, with no manual setup needed.

What language is exo written in?

Mainly Python. The stack also includes Python, MLX, RDMA.

What license does exo use?

Use freely for any purpose including commercial. Keep the notice and disclose changes to the patent grant.

How hard is exo to set up?

Setup difficulty is rated hard, with roughly 1day+ to a first successful run.

Who is exo for?

Mainly developer.

Open on GitHub → Explain another repo

This repo across BitVibe Labs

Scan in gitsafehub Deploy in gitdeployhub exo-explore on gitmyhub

Verify against the repo before relying on details.