explaingit

pingcap/awesome-database-learning

10,830Audience · developerComplexity · 1/5Setup · easy

TLDR

A curated reading list of research papers, university courses, and blog posts covering database internals, from query optimizers and storage engines to distributed consensus and transaction handling.

Mindmap

mindmap
  root((repo))
    What It Does
      Reading list
      Database internals
      Academic papers
    Topics
      Query optimization
      Storage engines
      Transactions
      Distributed consensus
    Resources
      Research papers
      University courses
      Blog posts
    Audience
      Backend engineers
      Database builders
Click or tap to explore — scroll the page freely

Code map

Detail Auto

An interactive map of this repo's files and how they connect — its source is parsed live in your browser. Click Visualize to build it.

filefunction / class

Things people build with this

USE CASE 1

Follow a structured path from knowing SQL to understanding what the database actually does when your query runs.

USE CASE 2

Find the original papers on B-trees, LSM trees, or log-structured storage before implementing your own storage engine.

USE CASE 3

Study Volcano and Cascades optimizer frameworks to understand how production databases turn SQL into execution plans.

USE CASE 4

Research distributed consensus algorithms like Raft or Paxos from primary academic sources before building a distributed system.

Getting it running

Difficulty · easy Time to first run · 5min
No license information was found in the repository description.

In plain English

This repository is a reading list for people who want to understand how databases work on the inside. It was put together by PingCAP, the company behind the TiDB database, and is aimed at engineers who want to go deeper than just using databases to actually understanding how they are built. The list is organized by topic rather than by difficulty. Each section covers a specific component of a database system, such as query optimization, transaction handling, storage engines, data replication, or consensus algorithms. Within each topic, you will find links to research papers, university course materials, blog posts, and recorded talks. The papers include many classic publications from database conferences going back to the 1970s, alongside more recent work on distributed systems. Some sections are highly technical. Query optimization, for example, covers papers on optimizer frameworks like Volcano and Cascades, which are the architectures that real production databases use to turn a SQL query into an efficient execution plan. The storage section covers data structures like B-trees and log-structured merge trees, which control how data is physically written to disk. There are also sections on concurrency control, network protocols, benchmarking, and formal verification using a specification language called TLA+. There is no code to run in this repository. It is purely a reference collection. A number of links are in Chinese, reflecting the original audience, but many of the papers and course materials are in English. For someone who wants to go from knowing how to write SQL queries to understanding what happens underneath when those queries run, this list provides a structured path through the academic and engineering literature on the subject.

Copy-paste prompts

Prompt 1
I want to understand how a SQL query optimizer works from first principles. Based on the papers in awesome-database-learning, what should I read first and in what order?
Prompt 2
Explain to me how an LSM tree works and why databases like RocksDB and LevelDB use it instead of a B-tree for write-heavy workloads.
Prompt 3
I'm building a simple key-value store in Go. Walk me through the key design decisions for storage, indexing, and crash recovery based on classic database research.
Prompt 4
What is the difference between optimistic and pessimistic concurrency control in databases? Give me concrete examples of when each is the right choice.
Open on GitHub → Explain another repo

← pingcap on gitmyhub — every repo by this author, as a profile.

Verify against the repo before relying on details.