explaingit

yichuan-w/leann

10,995PythonAudience · developerComplexity · 3/5LicenseSetup · moderate

TLDR

A vector database that runs entirely on your laptop, letting you search through large personal data collections using AI without sending anything to the cloud.

Mindmap

mindmap
  root((LEANN))
    What it does
      Local AI search
      97% less storage
      No cloud required
    Data sources
      PDFs and files
      Apple Mail
      Chat history
      Browser history
    Tech stack
      Python
      DiskANN
      HNSW
    Setup
      macOS and Linux
      WSL on Windows
      PyPI install
Click or tap to explore — scroll the page freely

Code map

Detail Auto

An interactive map of this repo's files and how they connect — its source is parsed live in your browser. Click Visualize to build it.

filefunction / class

Things people build with this

USE CASE 1

Build a local AI assistant that searches your PDFs, emails, and chat history without an internet connection.

USE CASE 2

Add semantic search to Claude Code so it can find conceptually related code and documents in your projects.

USE CASE 3

Search through 60 million text chunks from personal data using only 6 gigabytes of storage instead of 200.

USE CASE 4

Index your iMessage history, WeChat chats, and browser history for private local AI-powered search.

Tech stack

PythonDiskANNHNSWPyPIMCP

Getting it running

Difficulty · moderate Time to first run · 30min

Building from source requires additional system libraries for the DiskANN and HNSW graph index backends.

Free to use for any purpose including commercial, just keep the copyright notice, MIT license.

In plain English

LEANN is a vector database designed to run entirely on a personal laptop. The core idea is to let anyone set up an AI assistant that can search through large personal data collections, without sending that data to any cloud service. The project calls this approach RAG, which stands for Retrieval-Augmented Generation: before an AI answers a question, it first finds relevant documents in a local index and uses them to inform the response. The main technical claim is a 97% reduction in storage compared to conventional vector databases, without losing accuracy. Traditional approaches store numerical representations (called embeddings) for every piece of text in the index. LEANN instead stores only a graph of relationships between documents and recomputes the embeddings on demand during a search. This lets a collection of 60 million text chunks fit in about 6 gigabytes rather than 200 gigabytes. The supported data sources are broad. The README walks through specific setups for searching your local file system (PDFs, text files, Markdown), Apple Mail, browser history, iMessage, WeChat chat history, ChatGPT and Claude conversation exports, Slack messages, and Twitter bookmarks. There is also an integration with Claude Code's MCP protocol, which adds semantic search (finding conceptually related results) on top of the basic keyword search Claude Code offers by default. Installation is via PyPI using the uv package manager. The project runs on macOS (both ARM and Intel), Linux (Ubuntu, Arch, RHEL-based), and Windows through WSL. Building from source requires additional system libraries for the graph index backends (DiskANN and HNSW). The project collects zero telemetry. It is licensed under MIT. The full README is longer than what was shown.

Copy-paste prompts

Prompt 1
I installed LEANN on my Mac. Help me set up a local RAG assistant that indexes my PDF files and answers questions about them without using any cloud services.
Prompt 2
Show me how to connect LEANN to my Apple Mail archive so I can search my emails using natural language queries.
Prompt 3
How do I integrate LEANN as an MCP server with Claude Code to enable semantic search across my project files?
Prompt 4
I want to index my ChatGPT conversation exports and iMessage history with LEANN on Linux, walk me through the setup steps.
Prompt 5
Explain how LEANN achieves 97% storage reduction compared to a standard vector database and when I might hit its accuracy limits.
Open on GitHub → Explain another repo

← yichuan-w on gitmyhub — every repo by this author, as a profile.

Verify against the repo before relying on details.