explaingit

apache/kylin

3,766JavaAudience · dataComplexity · 4/5LicenseSetup · hard

TLDR

An open source SQL analytics engine that makes billion-row queries on Hadoop data return in seconds instead of hours, by pre-computing results across many dimension combinations before anyone asks the question.

Mindmap

mindmap
  root((Apache Kylin))
    What it does
      Fast SQL queries
      Pre-computed results
      Hadoop analytics
    Tech stack
      Java
      Hadoop
      Docker
    Use cases
      BI dashboards
      Big data queries
      OLAP analytics
    Audience
      Data engineers
      Analysts
Click or tap to explore — scroll the page freely

Code map

Detail Auto

An interactive map of this repo's files and how they connect — its source is parsed live in your browser. Click Visualize to build it.

filefunction / class

Things people build with this

USE CASE 1

Run sub-second SQL queries against a Hadoop data warehouse with billions of rows

USE CASE 2

Build a business intelligence dashboard backed by massive company datasets without waiting hours per query

USE CASE 3

Try out OLAP-style multi-dimensional analytics on big data using the provided Docker image before committing to a full cluster

Tech stack

JavaHadoopSQLDocker

Getting it running

Difficulty · hard Time to first run · 30min

The Docker image covers getting started quickly, but production use requires a running Hadoop cluster with distributed infrastructure.

Use, modify, and distribute freely for any purpose including commercial use, as long as you keep the Apache 2.0 license notice.

In plain English

Apache Kylin is an open source analytics engine originally contributed by eBay. It is designed to run SQL queries against very large datasets stored in Hadoop, a distributed data processing system used by many large companies to store and process massive amounts of information. The core idea is to let teams ask questions of their data using standard SQL, a query language that non-technical users sometimes encounter in spreadsheet tools or business intelligence software. Kylin pre-computes results across many combinations of dimensions so that queries that would otherwise take hours come back quickly. Getting started is straightforward. The project provides a ready-made Docker image, which is a self-contained package you can run on your own computer with two terminal commands. After a few minutes a web interface becomes available at a local URL where you can log in and begin exploring. The README is short and points to external documentation and a mailing list for support. It does not describe pricing, cloud hosting options, or any specific industries it targets. The project is released under the Apache 2.0 open source license.

Copy-paste prompts

Prompt 1
Walk me through starting Apache Kylin with Docker and running a sample SQL query against the included demo data.
Prompt 2
Design a Kylin cube for a sales dataset with dimensions for date, region, and product category, and show the SQL to query pre-aggregated totals.
Prompt 3
How does Apache Kylin's pre-aggregation approach differ from running queries directly on Hive or Spark SQL, and when does Kylin win?
Prompt 4
What are the steps to connect Apache Kylin to an existing Hadoop cluster and index a new table for fast queries?
Open on GitHub → Explain another repo

← apache on gitmyhub — every repo by this author, as a profile.

Verify against the repo before relying on details.