explaingit

apache/doris

15,333JavaAudience · dataComplexity · 4/5LicenseSetup · hard

TLDR

Apache Doris is an open-source analytical database that runs complex queries across billions of rows in under a second, supporting real-time dashboards, BI tools, and federated queries across data lakes.

Mindmap

mindmap
  root((doris))
    What it does
      Sub-second queries
      Real-time dashboards
      Federated data lake queries
    Architecture
      Frontend nodes
      Backend nodes
      MPP parallel processing
    Use Cases
      BI and ad-hoc analysis
      Log and event analysis
      A/B test reporting
    Tech Stack
      Java
      Standard SQL
      MySQL protocol
    Setup
      Multi-node cluster
      Apache 2.0 license
Click or tap to explore — scroll the page freely

Code map

Detail Auto

An interactive map of this repo's files and how they connect — its source is parsed live in your browser. Click Visualize to build it.

filefunction / class

Things people build with this

USE CASE 1

Build a real-time analytics dashboard that queries billions of rows and returns results in under a second.

USE CASE 2

Run ad-hoc BI queries across your data warehouse without a separate ETL step.

USE CASE 3

Analyze user behavior and A/B test results by joining event streams as they arrive.

USE CASE 4

Query data sitting in Apache Hive or Iceberg data lakes without moving it into Doris first.

Tech stack

JavaSQLMySQL Protocol

Getting it running

Difficulty · hard Time to first run · 1day+

Requires deploying separate Frontend and Backend node processes, a real cluster needs multiple machines or VMs.

Apache 2.0, use freely for any purpose including commercial products, as long as you keep the copyright and license notice.

In plain English

Apache Doris is an open-source analytical database, built for asking complex questions across very large amounts of data and getting answers back quickly. The README describes it as easy to use, high performance, and real-time, with a goal of returning query results in under a second even when the underlying data is huge. It supports both high-concurrency point queries (lots of small lookups at once) and high-throughput complex analysis (a few heavy queries crunching across a lot of data). The way it works is built on an MPP (massively parallel processing) architecture, meaning a query is split and run across many machines at the same time. Doris uses two kinds of processes: Frontend (FE) nodes, which handle user requests, parse and plan queries, and manage metadata, and Backend (BE) nodes, which store the data and execute queries. Data is partitioned into shards and copied across multiple BE nodes for reliability. Multiple FE nodes can be deployed for disaster recovery, organized as Master, Follower, and Observer roles. Doris speaks the MySQL protocol and supports standard SQL, so you can connect with familiar clients and BI tools. You would reach for Apache Doris when you need a unified analytics platform, real-time dashboards, ad-hoc BI queries, user behavior and A/B test analysis, log and event analysis, and querying data sitting in data lakes such as Apache Hive, Apache Iceberg, or Apache Hudi. It also supports federated queries that join data across multiple sources, pitched as a way to eliminate data silos. Doris is an Apache Software Foundation project released under Apache 2.0, with Java as the primary language.

Copy-paste prompts

Prompt 1
I have Apache Doris running with FE and BE nodes. Write a SQL query to analyze daily active users from an events table partitioned by date, grouping by user_id and counting distinct sessions.
Prompt 2
Show me how to connect Apache Doris to Grafana using the MySQL protocol connector and write a sample panel query for a time-series chart.
Prompt 3
I want to query an Apache Iceberg table from Apache Doris without copying the data. Write the CREATE CATALOG statement and a sample cross-source JOIN query.
Prompt 4
Help me design a Doris table schema for web server logs, choose the right partition strategy and replication count for a 3-node cluster.
Prompt 5
Write a Python script using mysql-connector-python to batch-insert 100k rows into Apache Doris and verify the row count afterward.
Open on GitHub → Explain another repo

← apache on gitmyhub — every repo by this author, as a profile.

Verify against the repo before relying on details.