Find structured learning paths and bootcamps to transition into a data engineering career.
Discover and compare data engineering tools like Airflow, Snowflake, and Apache Iceberg for your projects.
Read engineering blogs and whitepapers from companies like Netflix, Uber, and Google to learn industry best practices.
Locate data engineering creators and communities on YouTube and LinkedIn for ongoing learning and networking.
The Data Engineering Handbook is a curated collection of learning resources, tools, and community links for people who want to become data engineers or deepen their existing skills in the field. Data engineering is the discipline of building systems that collect, store, transform, and deliver data so that analysts and data scientists can use it. The handbook solves the problem of information scatter: instead of hunting across dozens of websites, books, and newsletters, everything a learner needs is gathered in one place. The repository works as a living reference document rather than a code project. It contains links to beginner and intermediate boot camps, a curated list of over 25 books covering topics like data-intensive systems and machine learning infrastructure, and a categorized directory of companies and open-source tools organized by function: orchestration tools like Airflow and Dagster, data lake formats like Apache Iceberg, data warehouses like Snowflake, analytics tools like Metabase and Apache Superset, and real-time data platforms. It also links to technical whitepapers from Google and other organizations, engineering blogs from Netflix, Uber, Airbnb, and Meta, and a directory of data engineering creators on YouTube, LinkedIn, and other platforms. You would use this repository as a starting point if you are new to data engineering and need a structured learning path, or as a reference if you are an experienced engineer exploring new tools in the ecosystem. The materials span multiple skill levels, from absolute beginners to people preparing for technical interviews. The primary format is Jupyter Notebook alongside Markdown files, hosted on GitHub.
Generated 2026-05-18 · Model: sonnet-4-6 · Verify against the repo before relying on details.