Learn to build and deploy production data pipelines using industry-standard tools.
Gain hands-on experience with Docker, Terraform, and workflow orchestration for real data engineering jobs.
Master data warehousing, transformation, and streaming technologies through structured modules and a capstone project.
Transition from SQL/Python knowledge to full-stack data engineering with practical homework and real-world scenarios.
Requires Docker, GCP account with BigQuery, Terraform, and multiple distributed systems (Kafka, Spark, Kestra) to run full examples.
Data Engineering Zoomcamp is a free nine-week online course that teaches the fundamentals of building data pipelines from scratch. Data engineering is the discipline of designing and building the systems that collect, move, transform, and store data so that it can be used for analysis and machine learning. The course addresses the gap that many aspiring data professionals face: they know how to write SQL or Python but do not have hands-on experience with the production infrastructure tools that real data jobs require. The course is structured as seven modules followed by a final project. The first module covers containerization using Docker and infrastructure provisioning using Terraform, which are tools for packaging software and managing cloud resources consistently. Module two teaches workflow orchestration, the practice of scheduling and monitoring data pipelines, using Kestra. Later modules cover data warehousing in Google BigQuery, analytics engineering with dbt which is a tool for transforming data inside a warehouse using SQL, batch processing with Apache Spark for large-scale distributed computation, and streaming data with Apache Kafka for real-time event processing. Each module includes homework assignments, and the course ends with a capstone project where students build a complete end-to-end pipeline. You would enroll in or self-study this course if you have basic Python and SQL knowledge and want practical experience with the tools used in industry data engineering roles. The course runs in cohorts starting each January, but all materials including Jupyter Notebooks, lecture videos, and homework are freely available for self-paced study. The primary format is Jupyter Notebook alongside code and configuration files.
Generated 2026-05-18 · Model: sonnet-4-6 · Verify against the repo before relying on details.