Analysis updated 2026-05-18
Learn how to build a CDC-based real-time streaming pipeline using Kafka, Debezium, and ClickHouse
Set up a working demo analytics dashboard showing live order metrics from a realistic e-commerce dataset
Study how to wire Go producers and consumers to Kafka in a multi-service Docker Compose environment
| el10savio/ecommrt | alexremn/finalizer-doctor | azer/diskwhere | |
|---|---|---|---|
| Stars | 3 | 3 | 3 |
| Language | Go | Go | Go |
| Setup difficulty | moderate | easy | easy |
| Complexity | 4/5 | 3/5 | 1/5 |
| Audience | developer | ops devops | developer |
Figures from each repo's GitHub metadata at analysis time.
Requires Docker Desktop and the Olist CSV files downloaded from Kaggle placed in the data/olist/ directory before running make setup.
Ecommrt is a demo data pipeline that shows how to stream database changes into a real-time analytics dashboard. It uses a publicly available Brazilian e-commerce dataset from Kaggle as its data source, simulating a live online store by continuously feeding order records into the system. The goal is to show how changes in a database can be picked up automatically and displayed on a live dashboard without any manual refresh or batch jobs. The architecture connects several tools in sequence. A program written in Go reads the dataset and sends order events into Kafka, which is a message queue that holds data in transit. Kafka routes those events to two places: a set of consumers that write orders into a PostgreSQL database, and directly into another database called ClickHouse that is designed for fast analytical queries. A tool called Debezium watches the PostgreSQL database for any new or updated rows and automatically forwards them into ClickHouse through Kafka as well. A Grafana dashboard then queries ClickHouse to show live business metrics: total revenue, orders per minute, top products, and pipeline health indicators. Setting up the project requires Docker, the Go runtime, and the Olist CSV files downloaded from Kaggle. Running a single setup command starts all services in the correct order, applies the database schema, and connects the components together. Once running, the Grafana dashboard is available at a local web address. This project is useful as a learning reference for engineers who want to understand how to build a CDC-based streaming pipeline with real components rather than toy examples. It demonstrates how to combine Kafka, Debezium, ClickHouse, and Grafana in a working system using a realistic dataset.
A demo streaming pipeline that feeds a Brazilian e-commerce dataset through Kafka and Debezium CDC into ClickHouse, displayed as a live Grafana analytics dashboard.
Mainly Go. The stack also includes Go, Kafka, PostgreSQL.
Setup difficulty is rated moderate, with roughly 30min to a first successful run.
Mainly developer.
This repo across BitVibe Labs
Verify against the repo before relying on details.