explaingit

citusdata/citus

12,483CAudience · ops devopsComplexity · 4/5LicenseSetup · moderate

TLDR

Citus is a PostgreSQL extension that spreads your database across multiple machines so it can handle far more data and traffic than a single server allows, while staying fully compatible with standard PostgreSQL tools and queries.

Mindmap

mindmap
  root((Citus))
    What it does
      Distributes PostgreSQL
      Parallel query execution
      Columnar storage
    Use Cases
      Multi-tenant apps
      Analytics workloads
      IoT data ingestion
    Tech Stack
      C
      PostgreSQL extension
      Docker
    Deployment
      Local Docker
      Ubuntu Debian
      Azure Cosmos DB
Click or tap to explore — scroll the page freely

Code map

Detail Auto

An interactive map of this repo's files and how they connect — its source is parsed live in your browser. Click Visualize to build it.

filefunction / class

Things people build with this

USE CASE 1

Scale a multi-tenant SaaS database to handle thousands of customers without switching away from PostgreSQL.

USE CASE 2

Run analytics queries on large datasets by distributing computation across multiple machines in parallel.

USE CASE 3

Handle high-speed time-series or IoT data ingestion that a single PostgreSQL server cannot keep up with.

USE CASE 4

Use columnar storage to compress wide tables and speed up queries that only read a few columns.

Tech stack

CPostgreSQLDocker

Getting it running

Difficulty · moderate Time to first run · 30min

Requires a running PostgreSQL instance, production use needs multiple machines for worker nodes.

Fully open source following PostgreSQL's extension model, use freely including for commercial purposes.

In plain English

Citus is an extension for PostgreSQL, one of the most widely used relational databases. PostgreSQL normally stores and queries data on a single computer, and Citus extends it to spread that work across multiple computers (called a cluster). This makes it possible to handle much larger amounts of data and higher traffic than a single database server could manage on its own. The extension adds the concept of distributed tables. Instead of storing all the rows of a table on one machine, Citus splits them into smaller chunks called shards and places those shards across the nodes in the cluster. When a query arrives, Citus routes it to the right shards and runs parts of it in parallel across multiple machines, then combines the results. The database still looks and behaves like ordinary PostgreSQL to any application connecting to it. Citus is particularly suited to three types of workloads: multi-tenant applications (where data for many separate customers lives in one database), analytical queries that scan large volumes of data, and real-time data ingestion such as time-series or IoT scenarios. It also includes a columnar storage option that compresses data and speeds up queries that read only some columns of a table. You can run Citus on a single machine using Docker for testing, install it locally as a package on Ubuntu, Debian, or Red Hat systems, or use it through Azure Cosmos DB for PostgreSQL, Microsoft's hosted version. Adding worker nodes to an existing cluster and rebalancing data across them is done through SQL commands. The project is fully open source and follows PostgreSQL's extension model, meaning it works with standard PostgreSQL tools and ships updates alongside new PostgreSQL releases.

Copy-paste prompts

Prompt 1
I have a PostgreSQL multi-tenant app hitting performance limits. Walk me through setting up Citus to shard the tenants table across 3 worker nodes.
Prompt 2
Show me the SQL commands to add a new worker node to an existing Citus cluster and rebalance the shards evenly.
Prompt 3
I want to test Citus locally before deploying. Give me a docker-compose file for a Citus coordinator plus 2 worker nodes.
Prompt 4
My analytics queries scan millions of rows. Show me how to enable columnar storage for a table in Citus and what query speed improvements to expect.
Open on GitHub → Explain another repo

← citusdata on gitmyhub — every repo by this author, as a profile.

Verify against the repo before relying on details.