explaingit

erikgrinaker/toydb

7,223RustAudience · developerComplexity · 4/5Setup · moderate

TLDR

toyDB is a distributed SQL database built from scratch in Rust as a learning project, showing how Raft consensus, snapshot-isolation transactions, and a SQL query engine fit together in clean, readable code.

Mindmap

mindmap
  root((toyDB))
    Consensus layer
      Raft protocol
      Multi-node cluster
      Fault tolerance
    Transactions
      Snapshot isolation
      Concurrent reads
      Time-travel queries
    SQL engine
      Joins and aggregates
      Query planner
      CLI client
    Learning tools
      Architecture guide
      Golden test scripts
      One-command cluster
Click or tap to explore — scroll the page freely

Code map

Detail Auto

An interactive map of this repo's files and how they connect — its source is parsed live in your browser. Click Visualize to build it.

filefunction / class

Things people build with this

USE CASE 1

Learn how distributed consensus and SQL transactions work by reading a well-commented Rust codebase built by a former CockroachDB engineer.

USE CASE 2

Spin up a 5-node local cluster with one shell script and experiment with SQL queries, time-travel reads, and transaction failures.

USE CASE 3

Use the architecture guide and golden test suite as a study template for database internals courses or self-study.

Tech stack

RustSQLRaft

Getting it running

Difficulty · moderate Time to first run · 30min

Requires a Rust toolchain, write throughput is intentionally slow and not suitable for production use.

In plain English

toyDB is a distributed SQL database built from scratch in Rust as an educational project. The author originally wrote it in 2020 to understand how databases work internally, then rewrote it later after spending years building production databases at CockroachDB and Neon. The goal is to show how the core concepts behind distributed SQL databases fit together, with an emphasis on being readable and correct rather than fast or scalable. The database runs as a cluster of nodes that coordinate using a protocol called Raft, which ensures that all nodes agree on the same data even when some are unavailable. Transactions are supported with a property called snapshot isolation, meaning each transaction sees a consistent view of the data as it existed when the transaction started, without blocking other concurrent transactions from running. Two storage backends are included: one that persists data to disk and one that keeps everything in memory for testing. On top of the storage layer sits a SQL engine that supports standard features including joins, aggregates, and transactions. A query planner optimizes how queries are executed. The database also supports time-travel queries, which let users read historical versions of data from a specific past point in time. Setting up a local five-node cluster takes a single shell script. A command-line client then connects to any node and accepts SQL commands. The repository includes an architecture guide that walks through the codebase concept by concept, a SQL reference, and worked examples. Tests use a golden script format that records expected output and later checks that behavior stays the same. Performance is not a goal. Write throughput in particular is slow due to how disk syncing is handled. The project is explicit about this: the complexity required for production-grade performance would make the code harder to learn from, which would defeat the purpose.

Copy-paste prompts

Prompt 1
Using toyDB, start a 5-node local cluster and run a SQL query that demonstrates snapshot isolation, show me a transaction that reads data while another transaction is mid-write.
Prompt 2
Explain how toyDB implements the Raft consensus protocol. Walk me through the key Rust modules and what each one is responsible for.
Prompt 3
Show me a time-travel query in toyDB that reads historical data from a specific past point in time, and explain how the storage layer stores past versions.
Prompt 4
I want to understand how toyDB's query planner optimizes a JOIN query. Walk me through the relevant source files.
Open on GitHub → Explain another repo

← erikgrinaker on gitmyhub — every repo by this author, as a profile.

Verify against the repo before relying on details.