datatalksclub/data-engineering-zoomcamp

Analysis updated 2026-05-18

★ 40,680Jupyter NotebookAudience · developerComplexity · 3/5Setup · hard

Mindmap

mindmap
  root((repo))
    What it covers
      Docker and Terraform
      Workflow orchestration
      Data warehousing
      Batch and streaming
    Learning format
      Jupyter Notebooks
      Video lectures
      Homework assignments
    Tech stack
      Google BigQuery
      Apache Spark
      Apache Kafka
      dbt
    Use cases
      Build production pipelines
      Learn industry tools
      Complete capstone project
    Audience
      Python and SQL basics
      Career changers
      Self-paced learners

mindmap root((repo)) What it covers Docker and Terraform Workflow orchestration Data warehousing Batch and streaming Learning format Jupyter Notebooks Video lectures Homework assignments Tech stack Google BigQuery Apache Spark Apache Kafka dbt Use cases Build production pipelines Learn industry tools Complete capstone project Audience Python and SQL basics Career changers Self-paced learners

Click or tap to explore — scroll the page freely

What do people build with it?

USE CASE 1

Learn to build and deploy production data pipelines using industry-standard tools.

USE CASE 2

Gain hands-on experience with Docker, Terraform, and workflow orchestration for real data engineering jobs.

USE CASE 3

Master data warehousing, transformation, and streaming technologies through structured modules and a capstone project.

USE CASE 4

Transition from SQL/Python knowledge to full-stack data engineering with practical homework and real-world scenarios.

What is it built with?

PythonSQLDockerTerraformKestraGoogle BigQueryApache SparkApache Kafka

How does it compare?

	datatalksclub/data-engineering-zoomcamp	dataexpert-io/data-engineer-handbook	suno-ai/bark
Stars	40,680	41,199	39,105
Language	Jupyter Notebook	Jupyter Notebook	Jupyter Notebook
Setup difficulty	hard	easy	moderate
Complexity	3/5	1/5	3/5
Audience	developer	developer	developer

Figures from each repo's GitHub metadata at analysis time.

How do you get it running?

Difficulty · hard Time to first run · 1day+

Requires Docker, GCP account with BigQuery, Terraform, and multiple distributed systems (Kafka, Spark, Kestra) to run full examples.

License could not be detected automatically. Check the repository's LICENSE file before use.

In plain English

Data Engineering Zoomcamp is a free nine-week online course that teaches the fundamentals of building data pipelines from scratch. Data engineering is the discipline of designing and building the systems that collect, move, transform, and store data so that it can be used for analysis and machine learning. The course addresses the gap that many aspiring data professionals face: they know how to write SQL or Python but do not have hands-on experience with the production infrastructure tools that real data jobs require. The course is structured as seven modules followed by a final project. The first module covers containerization using Docker and infrastructure provisioning using Terraform, which are tools for packaging software and managing cloud resources consistently. Module two teaches workflow orchestration, the practice of scheduling and monitoring data pipelines, using Kestra. Later modules cover data warehousing in Google BigQuery, analytics engineering with dbt which is a tool for transforming data inside a warehouse using SQL, batch processing with Apache Spark for large-scale distributed computation, and streaming data with Apache Kafka for real-time event processing. Each module includes homework assignments, and the course ends with a capstone project where students build a complete end-to-end pipeline. You would enroll in or self-study this course if you have basic Python and SQL knowledge and want practical experience with the tools used in industry data engineering roles. The course runs in cohorts starting each January, but all materials including Jupyter Notebooks, lecture videos, and homework are freely available for self-paced study. The primary format is Jupyter Notebook alongside code and configuration files.

Copy-paste prompts

Prompt 1

Walk me through the data engineering zoomcamp module on Docker and Terraform, what problems do they solve in a data pipeline?

Prompt 2

I want to set up a data pipeline using Kestra for workflow orchestration. Show me how the zoomcamp course structures this.

Prompt 3

Explain the dbt module from data engineering zoomcamp: how do you transform data inside a warehouse using SQL?

Prompt 4

What's the difference between batch processing with Spark and streaming with Kafka? Use examples from the zoomcamp course.

Prompt 5

Help me design a capstone project for the data engineering zoomcamp that uses BigQuery, dbt, and Spark together.

Frequently asked questions

What is data-engineering-zoomcamp?

Free nine-week course teaching data pipeline fundamentals: Docker, Terraform, workflow orchestration, BigQuery, dbt, Spark, and Kafka for aspiring data engineers.

What language is data-engineering-zoomcamp written in?

Mainly Jupyter Notebook. The stack also includes Python, SQL, Docker.

What license does data-engineering-zoomcamp use?

License could not be detected automatically. Check the repository's LICENSE file before use.

How hard is data-engineering-zoomcamp to set up?

Setup difficulty is rated hard, with roughly 1day+ to a first successful run.

Who is data-engineering-zoomcamp for?

Mainly developer.

Open on GitHub → Explain another repo

This repo across BitVibe Labs

Scan in gitsafehub Deploy in gitdeployhub datatalksclub on gitmyhub

Verify against the repo before relying on details.