explaingit

prestodb/presto

Analysis updated 2026-06-24

16,702JavaAudience · dataComplexity · 5/5Setup · hard

TLDR

A distributed SQL query engine that runs analytical SQL across petabyte-scale data lakes by splitting work across many worker servers in parallel.

Mindmap

mindmap
  root((presto))
    Inputs
      SQL query
      Hive metastore
      S3 or HDFS data
    Outputs
      Query results
      Distributed execution plan
    Use Cases
      Query data lakes
      Federated SQL across sources
      Ad hoc analytics
    Tech Stack
      Java
      Hive
      Hadoop
    Architecture
      Coordinator
      Workers
      Connectors
Click or tap to explore — scroll the page freely

Code map

Detail Auto

An interactive map of this repo's files and how they connect — its source is parsed live in your browser. Click Visualize to build it.

filefunction / class

What do people build with it?

USE CASE 1

Run SQL across a Hive or S3-backed data lake without moving the data

USE CASE 2

Join data from multiple sources like MySQL and Kafka in one query

USE CASE 3

Power an interactive BI dashboard over petabyte-scale tables

USE CASE 4

Stand up a distributed query cluster for a data engineering team

What is it built with?

JavaSQLHiveHadoop

How does it compare?

prestodb/prestoshuzheng/zhengwinterbe/java8-tutorial
Stars16,70216,67716,746
LanguageJavaJavaJava
Setup difficultyhardhardeasy
Complexity5/55/51/5
Audiencedatadeveloperdeveloper

Figures from each repo's GitHub metadata at analysis time.

How do you get it running?

Difficulty · hard Time to first run · 1day+

Real use needs a coordinator plus worker cluster and a catalog like Hive metastore wired to your data source.

In plain English

Presto is a distributed SQL query engine built for big data. SQL is the standard language used to ask questions of databases, like "show me all sales from last month." Presto's specialty is running those same SQL queries across enormous datasets that are spread across many machines, making it possible to analyze data at a scale that would overwhelm a traditional database. The key idea is "distributed", instead of one machine doing all the work, Presto splits a query across many servers working simultaneously. This makes it possible to query petabytes of data and get results in seconds or minutes rather than hours. It connects to many different data sources, including Hive and Hadoop-based data lakes, meaning organizations can query data wherever it lives without moving it first. The project is the official home of the Presto engine and is written in Java. It's used in enterprise environments where data engineering teams need to run analytical queries across large-scale data infrastructure. Topics associated with the project include big data, data lakes, Hive, and Hadoop, reflecting its positioning in the modern data warehouse ecosystem. Setting up Presto involves running a cluster of servers, a coordinator that receives queries and distributes work, plus worker nodes that do the actual processing. Once running, analysts and data scientists interact with it using standard SQL tools, asking questions of large datasets without needing to know how the distributed processing happens underneath.

Copy-paste prompts

Prompt 1
Walk me through running a single-node Presto coordinator on Docker and querying a sample Hive table
Prompt 2
Show me how to configure a Presto connector for S3 plus Glue catalog
Prompt 3
Compare Presto vs Trino vs Athena and tell me which fits a 10-engineer team best
Prompt 4
Help me tune a slow Presto query that joins a 1TB fact table to a small dim table
Prompt 5
Set up a Presto cluster with one coordinator and three workers using docker-compose

Frequently asked questions

What is presto?

A distributed SQL query engine that runs analytical SQL across petabyte-scale data lakes by splitting work across many worker servers in parallel.

What language is presto written in?

Mainly Java. The stack also includes Java, SQL, Hive.

How hard is presto to set up?

Setup difficulty is rated hard, with roughly 1day+ to a first successful run.

Who is presto for?

Mainly data.

Open on GitHub → Explain another repo

This repo across BitVibe Labs

Scan in gitsafehub Deploy in gitdeployhub prestodb on gitmyhub

Verify against the repo before relying on details.