explaingit

quickwit-oss/tantivy

Analysis updated 2026-06-24

15,180RustAudience · developerComplexity · 4/5LicenseSetup · moderate

TLDR

Tantivy is a Rust full-text search library inspired by Apache Lucene. You embed it in your own program to index and search large amounts of text with BM25 ranking.

Mindmap

mindmap
  root((tantivy))
    Inputs
      Text documents
      Numeric fields
      Facets
    Outputs
      Search results
      Aggregations
      Histograms
    Use Cases
      Embed search in app
      Build CLI search tool
      Index large corpora
    Tech Stack
      Rust
      BM25
      LZ4
      Zstd
Click or tap to explore — scroll the page freely

Code map

Detail Auto

An interactive map of this repo's files and how they connect — its source is parsed live in your browser. Click Visualize to build it.

filefunction / class

What do people build with it?

USE CASE 1

Embed a full-text search engine inside a Rust application without running a separate server.

USE CASE 2

Build a fast CLI tool that searches a local corpus with sub-10ms startup.

USE CASE 3

Index and query structured data with text, numeric, date, and facet fields.

USE CASE 4

Run faceted search and aggregations like histograms and stats over an indexed dataset.

What is it built with?

RustBM25LZ4Zstd

How does it compare?

quickwit-oss/tantivybenfred/py-spycanner/wrenai
Stars15,18015,17815,194
LanguageRustRustRust
Setup difficultymoderateeasymoderate
Complexity4/52/54/5
Audiencedeveloperdeveloperdata

Figures from each repo's GitHub metadata at analysis time.

How do you get it running?

Difficulty · moderate Time to first run · 30min

Requires a Rust toolchain and writing code against the crate API, there is no ready-to-run server.

MIT licensed: use freely in personal and commercial projects as long as the copyright notice is kept.

In plain English

Tantivy is a full-text search engine library written in Rust. It is not a ready-to-run search server like Elasticsearch or Apache Solr. Instead, it is a piece of code (a crate, in Rust terms) that a developer adds to their own program so that the program can search through large amounts of text. The README describes it as closer in spirit to Apache Lucene, the older Java library that Elasticsearch and Solr are themselves built on top of. The feature list covers what you can do with it. Searches use BM25 scoring, the same ranking formula Lucene uses. You can write queries in a natural form such as (michael AND jackson) OR "king of pop", and run phrase searches. Indexing is multithreaded and incremental, meaning you can add new documents without rebuilding the whole index. The README says indexing the full English Wikipedia takes under three minutes on the author's desktop. Startup is under 10 milliseconds, which the README calls useful for command-line tools. Tantivy supports many field types: text, integers, floats, dates, IP addresses, booleans, and hierarchical facets. It can store documents in compressed form using LZ4 or Zstd, run range queries and faceted search, and roll up results with an aggregation collector that produces histograms, range buckets, averages, and stats. Tokenizers, the pieces that split text into searchable words, are configurable, with stemming for 17 Latin-script languages and third-party add-ons for Chinese, Japanese, and Korean. The README is explicit about what Tantivy does not do. Distributed search across many machines is out of scope, for that the same team points readers to a separate project called Quickwit, which is built on top of Tantivy. Data inside an index is immutable, so editing a document means deleting it and indexing the new version. New documents only become searchable after a commit call on the index writer, and existing readers need to be reloaded to see the change. Bindings exist for Python and Ruby, and the README lists projects that use Tantivy, including a Matrix chat message indexer and a typo-tolerant search engine with a REST API. Companies named as users include Etsy and ParadeDB.

Copy-paste prompts

Prompt 1
Show me a minimal Rust program that builds a Tantivy index from a folder of text files and runs BM25 queries against it.
Prompt 2
Add multilingual tokenization with stemming to a Tantivy index and demonstrate it on French and German text.
Prompt 3
Use Tantivy's aggregation collector to produce a histogram and range buckets over a date field in my index.
Prompt 4
Wire the Python bindings of Tantivy into a Flask app so I can POST documents and GET search results.
Prompt 5
Migrate a small Elasticsearch use case to Tantivy embedded in my Rust service, keeping phrase queries and facets.

Frequently asked questions

What is tantivy?

Tantivy is a Rust full-text search library inspired by Apache Lucene. You embed it in your own program to index and search large amounts of text with BM25 ranking.

What language is tantivy written in?

Mainly Rust. The stack also includes Rust, BM25, LZ4.

What license does tantivy use?

MIT licensed: use freely in personal and commercial projects as long as the copyright notice is kept.

How hard is tantivy to set up?

Setup difficulty is rated moderate, with roughly 30min to a first successful run.

Who is tantivy for?

Mainly developer.

Open on GitHub → Explain another repo

This repo across BitVibe Labs

Verify against the repo before relying on details.