explaingit

trinodb/trino

12,812JavaAudience · dataComplexity · 5/5LicenseSetup · hard

TLDR

Trino is an open-source SQL engine that lets you query massive amounts of data across multiple databases, data lakes, and cloud storage at once, without moving the data first.

Mindmap

mindmap
  root((Trino))
    What it does
      Distributed SQL queries
      Multi-source federation
      Petabyte scale
    Architecture
      Coordinator node
      Worker nodes
      Connectors
    Data sources
      Data lakes
      Relational databases
      Cloud storage
    Setup
      Java 25 required
      Maven build
      Docker needed
Click or tap to explore — scroll the page freely

Code map

Detail Auto

An interactive map of this repo's files and how they connect — its source is parsed live in your browser. Click Visualize to build it.

filefunction / class

Things people build with this

USE CASE 1

Run a single SQL query that joins data from a PostgreSQL database and an S3 data lake without copying any data.

USE CASE 2

Analyze petabytes of event logs stored in a cloud data warehouse using standard SQL tools your team already knows.

USE CASE 3

Replace slow data export pipelines by querying production and analytics databases simultaneously in real time.

USE CASE 4

Build a self-service analytics layer where data analysts query multiple data sources through one standard SQL interface.

Tech stack

JavaMavenDocker

Getting it running

Difficulty · hard Time to first run · 1day+

Requires Java 25, Docker, and Maven to build from source, running a cluster requires multiple machines or containers.

Apache 2.0, use freely for any purpose including commercial, modify and distribute as long as you keep the copyright and license notice.

In plain English

Trino is a query engine that lets you run SQL queries across large amounts of data stored in many different places at once. Companies typically collect data in data warehouses, cloud storage buckets, relational databases, and other systems. Rather than moving all that data into one place first, Trino can connect to those sources simultaneously and run a single query that pulls results from all of them, returning answers quickly even when the underlying data is enormous. The name comes from its history: the project was originally created at Facebook and later became known as PrestoSQL before the community renamed it Trino. Today it is maintained by an independent open-source community and widely used in data analytics teams at companies that need to query petabytes of data across distributed infrastructure. Trino is written in Java and runs as a cluster of machines working together. One node coordinates the query plan while worker nodes execute pieces of it in parallel. Users connect to the cluster using standard SQL, so anyone who knows how to write a database query can use it. It also ships with a command-line client for running queries interactively. The project supports connections to many data sources through a plugin system. Common sources include data lakes in formats like Delta Lake, traditional relational databases, and object storage services. Each connection type is handled by a connector, and the codebase includes built-in connectors for several popular systems. Building Trino from source requires Java 25 and Docker, and the build is managed through Maven, which is a standard Java build tool. The repository's README is primarily a guide for developers who want to run or modify the engine locally rather than an introduction for end users. End-user documentation lives at a separate site.

Copy-paste prompts

Prompt 1
I'm setting up Trino to query both a PostgreSQL database and files in S3. Write me a Trino catalog config for each connector and a sample cross-source SQL query that joins a users table from Postgres with event data from S3.
Prompt 2
Help me write a Trino SQL query that aggregates 90 days of clickstream data from a Delta Lake table, partitioned by date, filtering to users in the US and grouping by product category.
Prompt 3
Explain how Trino distributes a SQL query across worker nodes and how I should size my cluster for queries over 10TB of data.
Prompt 4
Write a Trino connector configuration for querying a MySQL database and show me how to explore its schema using the information_schema tables.
Open on GitHub → Explain another repo

← trinodb on gitmyhub — every repo by this author, as a profile.

Verify against the repo before relying on details.