The Data Engineering Cookbook is a long, structured set of notes for people who want to learn data engineering, written by Andreas Kretz and kept as a single repository on GitHub. Data engineering, as the cookbook frames it, is the discipline of building the plumbing that moves data into and around the systems where data scientists and analysts use it. The repository itself is the book, split across many markdown files under a sections folder. The contents are organised as a curriculum rather than as software. The Introduction explains what a data engineer does, sketches a Data Science Platform Blueprint with stages for Connect, Buffer, Processing Framework, Store, and Visualize, and offers separate learning roadmaps for beginners, data analysts, data scientists, and software engineers. A Skills Matrix and a section on becoming a senior data engineer round out the opening chapter. The Basic Engineering Skills section covers a wide range of background topics: learning to code, getting comfortable with Git, agile development (including Scrum and OKR), software engineering culture, how a computer works, networking, security topics like SSL keys, JSON Web Tokens, and GDPR, Linux basics with shell scripting and cron jobs, Docker (including Kubernetes orchestration), and cloud concepts such as IaaS versus PaaS versus SaaS, the major providers, on-premises trade-offs, and hybrid setups. The Advanced Engineering Skills section continues with the data science platform itself, the four Vs of big data, planning, and a discussion of the limitations of traditional ETL. Beyond the skills chapters, the repository lists free hands-on courses and tutorials, real-world case studies, best practices for cloud platforms, a list of 130+ data sources for data science work, more than 1,000 interview questions, and a section of recommended books, courses, and podcasts. A separate Updates file acts as a change log. The README also points to the author's wider work: a YouTube channel, a Twitter account, an Amazon shop with podcast gear, and a paid online academy at learndataengineering.com that includes courses, a certification, and a Discord community. Contributions to the cookbook itself are welcome, and the README links to a contributing section near the bottom.
Generated 2026-05-21 · Model: sonnet-4-6 · Verify against the repo before relying on details.