erikbern/ann-benchmarks

★ 5,664PythonAudience · researcherComplexity · 3/5Setup · moderate

Mindmap

mindmap
  root((ANN Benchmarks))
    What it measures
      Queries per second
      Recall accuracy
      Speed vs accuracy tradeoff
    Libraries covered
      FAISS
      Annoy
      pgvector
      Qdrant
      30 plus others
    Datasets
      Image similarity
      Text embeddings
    How it works
      Docker isolation
      Python scripts
      Comparison charts

mindmap root((ANN Benchmarks)) What it measures Queries per second Recall accuracy Speed vs accuracy tradeoff Libraries covered FAISS Annoy pgvector Qdrant 30 plus others Datasets Image similarity Text embeddings How it works Docker isolation Python scripts Comparison charts

Click or tap to explore — scroll the page freely

Things people build with this

USE CASE 1

Compare the speed vs accuracy tradeoff of FAISS, Qdrant, and Annoy on your specific dataset before committing to a vector search library

USE CASE 2

Run reproducible performance benchmarks on your own machine using pre-built datasets for image similarity or text embedding retrieval

USE CASE 3

Add a new similarity search library to the benchmark suite by writing a Docker container config and harness adapter

Tech stack

PythonDockerFAISSAnnoypgvector

Getting it running

Difficulty · moderate Time to first run · 1h+

Each algorithm runs in an isolated Docker container, requires Docker installed and sufficient disk space for pre-built datasets.

In plain English

This repository is a benchmarking project that measures the performance of many different tools for doing fast similarity searches across large datasets. The core problem: given a large collection of items (such as images, songs, or text passages each encoded as a list of numbers), how quickly and accurately can you find the items most similar to a given query? This type of search is called nearest neighbor search, and the approximate version accepts a small accuracy tradeoff in exchange for much faster results. The project covers more than 30 different search libraries. The list includes FAISS (built by Facebook Research), Annoy (built by Spotify), pgvector (a PostgreSQL extension), Qdrant, Milvus, Elasticsearch, RediSearch, and many others. Each library is run inside an isolated container so that results are fair and reproducible across different machines. Pre-built datasets are provided in a standard file format covering tasks like image similarity lookup and text embedding retrieval. The benchmark measures two things simultaneously: how fast the search runs (queries per second) and how accurate the results are (recall, meaning what fraction of the true nearest neighbors the algorithm actually found). These two factors trade off against each other. A library might return results much faster but miss some of the correct matches. The project plots both metrics together on charts for each library, so users can compare the full performance curve rather than a single headline number. Running a benchmark requires installing the project, selecting a dataset and algorithm, and executing the provided Python scripts. Docker handles all per-algorithm environment setup automatically. Results are saved and can then be visualized as comparison charts. This is a purely research and evaluation tool, not a production search library itself. It exists to help developers choose the right similarity search tool for their specific situation, whether that means optimizing for speed, accuracy, memory usage, or the scale of the dataset.

Copy-paste prompts

Prompt 1

How do I run ann-benchmarks to compare FAISS and Annoy on the SIFT dataset and generate a recall vs queries-per-second chart?

Prompt 2

I want to add a new vector search library to ann-benchmarks, walk me through creating the Docker container config and Python algorithm wrapper

Prompt 3

Explain the recall metric used in ann-benchmarks and what the speed vs accuracy tradeoff means when choosing a nearest neighbor library

Prompt 4

Which libraries in ann-benchmarks tend to give the best recall at high query throughput for 100-dimensional float vectors, what do the benchmark charts show?

Prompt 5

How does ann-benchmarks use Docker to isolate each algorithm so benchmark results are reproducible across different machines?

Open on GitHub → Explain another repo

← erikbern on gitmyhub — every repo by this author, as a profile.

Verify against the repo before relying on details.